Numerical Methods: 1
Iteration
Programming and
Algebraic Equations
B. NOBLE dsc
UNIVERSITY MATHEMATICAL TEXTS
General Editors
Alexander C. Aitken D'Sc FRS
D. E. Rutherford DSc DrMath
OLIVER AND BOYD LTD
UNIVERSITY MATHEMATICAL TEXTS
GENERAL EDITORS
ALEXANDER C. AITKEN, D.sc., F.R.s.
DANIEL E. RUTHERFORD, D.sc, dr. math.
ASSISTANT EDITOR
IAIN T. ADAMSON, PH.D.
29
NUMERICAL METHODS
ITERATION, PROGRAMMING
AND ALGEBRAIC EQUATIONS
5-
Sot>
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
IS.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
Volume and Integral
Tensor Calculus
UNIVERSrTY MATHEMATICAL TEXTS
Determinants and Matrices ... A. C. Aiikcn, D.Sc., F.R.S.
Statistical Mathematics A. C. Aiiken, D.Sc., F.R.S.
Integration R. P. Gillespie, Ph.D.
Integration of Ordinary Differential Equations E. L. Incc, D.Sc.
Vector Methods D. E. Rutherford, D.Sc., Dr.Malh.
Theory of Equations H. W. Turnbull, F.R.S.
Functions of a Complex Variable ... E. G. Phillips, M.A., M.Sc.
Wanes C. A. Coulson, M.A., D.Sc., F.R.S.
Infinite Series J. M. Hyslop, D.Sc.
Analytical Geometry of
Tliree Dimensions W. H. McCrea, Ph.D., F.R.S.
Electricity C. A. Coulson, M.A.. D.Sc.. F.R.S.
Projective Geometry T. E. Faulkner, Ph.D.
Introduction to the Theory of
Finite Groups W. Ledcrmann, Ph.D., D.Sc.
Classical Mechanics ... D. E. Rutherford, D.Sc., Dr.Malh.
Partial Differentiation R. P. Gillespie, Ph.D.
... W. VV. Rogosinski, Dr. Phil., F.R.S.
B. Spain. Ph.D.
German-English Mathematical
Vocabulary ... S. Macintyre, Ph.D., and E. Witte, M.A.
Special Functions of Mathematical
Physics and Chemistry I. N. Sneddon, D.Sc.
Topology E. M. Patterson, Ph.D.
The Theory of Ordinary
Differential Equations
Fluid Dynamics
Special Relativity
Real Variable
Russian Reader in
Pure and Applied Mathematics . . .
Russian-English Mathematical
Vocabulary J. Burlak, M.Sc., Ph.D., and K. Brooke, M.A.
Introduction to Field Theory ... lain T. Adamson, Ph.D.
Number Theory J. Hunter, Ph.D.
Numerical Methods'. I. Iteration,
Programming and Algebraic Equations ... B. Noble. D.Sc.
Numerical Methods'. 2. Differences.
Integration and Differential Equations B. Noble, D.Sc.
Elementary Abstract Algebra
E. M. Patterson, Ph.D., and D. E. Rutherford, D.Sc., Dr.Math.
Vector Spaces of Finite Dimension G. C. Shephard, M.A., Ph.D.
Magnetohydrodynamics A. Jeffrey, Ph.D.
... J. C. Burkill, Sc.D, F.R.S.
D. E. Rutherford, D.Sc., Dr.Math.
W. Rindler, Ph.D.
J. M. Hyslop, D.Sc.
... P. H. Nidditch. Ph.D.
NUMERICAL METHODS
ITERATION, PROGRAMMING
AND ALGEBRAIC EQUATIONS
BEN NOBLE
M.A., B.Sc. D.Sc., A.M.I.E.E.
THE MATHEMATICS RESEARCH CENTER, VJ. ARMY, THE UNIVERSITY Of
WISCONSIN, MADISON, WISCONSIN, U.S.A. (fORMERLY OFT1IE DEPARTMENT
OF MATHEMATICS, THE ROYAL COIXEOE OF SCIENCE AND TECIINOLOOV,
OIJVSOOW, SCOIXAND)
OLIVER AND BOYD
EDINBURGH AND LONDON
NEW YORK: INTERSCIENCE PUBLISHERS INC.
OLIVER AND BOYD LTD
Tweeddale Court
Edinburgh 1
39a Welbeck Street
London W.l
First published
Reprinted
1964
1966
© 1964 B. Noble
Primed in Great Britain by
Oliver and Boyd Ltd., Edinburgh
PREFACE
It is desirable that all students or applied mathematics, the
physical sciences, and engineering should be given an
introduction to numerical analysis. The aim of this book
is to provide, for this purpose, an elementary exposition of
certain basic ideas. This has been done by considering a
strictly limited selection of typical methods with particular
emphasis on fundamentals, including error analysis, error
estimation, pitfalls, and the detection of mistakes. An
effort has been made to avoid the common practice of
regarding numerical methods as a collection of algorithms
or recipes. Rather than use a given amount of space to
describe several different methods for solving the same type
of problem, it has been thought preferable to illustrate
general principles by discussing one single method in some
detail. Numerical analysis should be a source of useful
tools but it should also make the student think.
The advent of automatic computers has produced a
revolution in computing. For this reason programming
for computers has been made an integral part of the
exposition. Although it is essential to teach the basic ideas
of programming, the author believes that the time spent on
the technical details in an elementary course should be cut
to a minimum. There arc advantages in teaching a simple
general form of automatic programming or autocode of the
type developed in Chapter III, which need occupy no more
than one or two lectures. The system is not applicable to
any particular machine but it contains many of the features
common to autocodes. If it is preferred to teach a program-
ming system used by an existing machine, the programs in
Vi
PREFACE
this book can easily be converted into the appropriate
autocode. (The word " program " is used in the technical
sense of a set of instructions specifying a procedure or
programme for solving a problem on a real or hypothetical
automatic computer.)
An automatic computer is indispensable for any institu-
tion concerned with the advanced teaching of scientists, but
the place which should be occupied by computers in teaching
elementary numerical analysis is likely to remain the subject
of debate for many years. One can visualise courses in
which all the practical work is done on automatic com-
puters, and equally successful courses in which only hand
machines are used. It is instructive and stimulating for a
student to see a computer in operation but he need not
become a programmer to learn about numerical analysis.
Hand computing and automatic computing may require
different habits of thought, but the basic principles are
common and these can almost always be illustrated by
simple examples. Difficulties in large-scale calculations are
not usually due to the emergence of some radically new
kind of error. The main point is that all students should
have first-hand experience of numerical calculation, how-
ever this is obtained. There is usually a " correct " answer
to a problem in numerical mathematics in a precisely
defined sense. The onus is on the person specifying the
calculating procedure to ensure that the information
obtained during the computation will indicate in what sense,
if any, the calculation has resulted in an approximation to
the " correct " answer. It may help to impress this upon
the student if he sees an automatic computer produce,
without warning, numbers which are meaningless and
wrong, as the result of a program which he himself has
written.
When all computations have to be performed by hand,
the student tends to devote a major part of his effort to the
technical details of numerical procedures. When the
PREFACE
vii
attitude is adopted that most calculations will be performed
by computer, the emphasis in introductory courses can be
shifted from the details of methods to the theory behind
classes of methods. Though the student should have some
acquaintance with the details of typical procedures, there is
little point in teaching complicated algorithms to students
who will normally meet such algorithms only in standard
computer programs devised by experts. Thus the author
would not normally teach Bairstow's method for finding the
complex roots of polynomials (§ 2.7) in an elementary
course.
Many examination questions on numerical methods are
designed to test powers of regurgitation rather than under-
standing. The standard questions beginning " Describe a
method for . . ." and " Find the solution of . . ." should
not exclude other types of question, for example : " The
calculations described below gave the following results. . . .
Find out whether the required solution of the problem can
be deduced from these results. In particular estimate the
effect of truncation error (or round-off error, or instability
of the method, or ill-conditioning of the original problem).
Suggest methods for checking your conclusions."
It seems to be common practice to teach numerical
analysis and basic mathematics in separate courses, and to
postpone numerical analysis to a late stage —even a post-
graduate stage — in a student's career. The author believes
that this is wrong in principle and that much of the material
in this book should ultimately be integrated into courses on
basic mathematics for applied scientists. The material
should be used to illustrate the mathematics, and to provide
motivation and stimulation for more mathematics.
Operator methods for the derivation of finite-difference
formulae have been omitted completely. Experience in
teaching indicates that students quickly acquire facility in
obtaining complicated formulae by operators without
having much idea of the limitations or implications of their
viii
PREFACE
results. It seems preferable in an introductory course to
concentrate attention on the basic underlying principle,
namely that functions arc being represented by polynomials
through equally spaced points. Other types of functional
representation can be introduced as natural alternatives.
Most error estimates have been obtained in the form
of a dominant term followed by a series of dots. Little
attempt has been made to obtain closed error estimates by
using a Taylor scries with a remainder term, or mean-value
theorems, since at an elementary level this complicates the
analysis without contributing materially to the understand-
ing of either the method or the results. Questions of
elegance and rigour enter into the discussion of a numerical
method but this book is not aimed at the pure mathematician.
In some ways numerical analysis should be approached as
if it were an experimental science. An exact answer exists
for any given problem, and the numerical analyst attempts
to determine this exact answer, to a specified degree of
accuracy, by means of suitably designed experiments.
Mathematics is the tool, but not the aim or object, of the
science.
Short chapters are included on eigenvalues and partial
differential equations. These are topics which are normally
regarded as " advanced ", but from a numerical point of
view they can be treated in almost as elementary a fashion
as the solution of algebraic equations, or ordinary dif-
ferential equations.
For further reading it is recommended that the student
start with Modern Computing Methods, H.M. Stationery
Office, 2nd. Edn. (1961), which both supplements and
complements the present text, and includes a list of papers
and books. In view of the comprehensive bibliography in
Modern Computing Methods it has seemed unnecessary to
include more than a few (non-systematic) references in the
present text.
Unless otherwise indicated the numerical examples have
PREFACE
ix
been worked on a hand-machine, and as a general rule one
extra decimal place has been carried in the intermediate
working. When the student checks numerical examples
his results may of course differ slightly from those in the
text in the last significant digit. Many of the examples at
the ends of chapters are intended to supplement the text,
not to provide additional exercises.
The origins of this book can be traced back to the
lectures of the late D. R. Hartree, and a year spent in the
Mathematical Laboratory, Cambridge, under M. V.
Wilkes in 1947-48 during the exciting early days of EDSAC.
Even though this experience failed to convert me into a
professional computer, it taught me that numerical analysis
is much more than an exercise in pure mathematics or a
collection of recipes.
The preparation of the manuscript of this book was a
time-consuming occupation. I feel indebted to the mathe-
matics department of the future second university in
Glasgow, to my wife, and to all correspondents who have
had to wait for answers to letters. They had much to
endure while the second draft of this book was being
typed with one finger.
It is a pleasure to acknowledge help from many friends.
The ideas in Chapter III on programming for automatic
computers originated from using the system known as
FORTRAN (FORmula TRANslalion) on the IBM 704 at
the New York University AEC Computing Facility. I
am grateful to E. Isaacson for encouragement to use this
machine and to Max Goldstein for indispensable advice.
The work was made possible by contract AF 19(604)5238 of
the Electronics Directorate of the U.S. Air Force Cambridge
Research Centre, which financed a visit to the Division of
Electromagnetic Research, the Institute of Mathematical
Sciences, New York University. A first draft of Chapter III
was considerably altered after correspondence with T.
Brooker and S. Gill, and a subsequent meeting with members
x PREFACE
of the committee on automatic programming of the British
Computer Society, chairman C. Strachey.
A remark of T. Brooker changed the emphasis of
Chapter II from " solution of equations " to " iterative
procedures ". D. J. Wheeler introduced me to the idea of
the " noise-level " of an iterative calculation. Various
comments in the text on the representation of functions by
polynomials have been modified as a result of criticisms by
G. F. Miller. Several points have been clarified by dis-
cussions with A. J. Howie, G. N. Lance, A. van Wijngaarden
and J. H. Wilkinson.
Many of the numerical examples have been checked by
A. Paton and M. V. Sweet. Numerical results have been
contributed by: J. G. Fraser and M. V. Sweet (Ex. 10.23),
A. Paton (Table 4.1), M. Walker (Ex. 11.8), F. J. Warner
(Ex. 10.17), S. B. L. Wilson (the table (11.26)), C. S. Wu
(Ex. 10.22).
A. C. Aitken, J. G. Fraser and M. V. Sweet have spent
much time in reading the manuscript in painstaking detail.
Their valuable criticisms have increased the accuracy and
readability of the text and the examples. G. Fairweather,
R. C. Peebles and B. Shaw have assisted in proof-
reading and contributed helpful comments. D. Greenspan
has suggested improvements and detected inaccuracies in
both manuscript and proofs. I am indebted to the editors,
A. C. Aitken and D. E. Rutherford, for advice and recom-
mendations.
Above all I am grateful to Miss A. Paton. In a period
of numerous distractions her assistance ensured a regular
weekly production of manuscript. It is certain that without
her help this book would never have been written.
B. NOBLE
Mill Cottage,
Broughton Mills, Lancs.
July, 1962.
CONTENTS
Preface
Volume I— ITERATION, PROGRAMMING
AND ALGEBRAIC EQUATIONS
I. Accuracy and Error
v
1.1
Introduction
1
1.2
Rounding off
2
1.3
Absolute and relative errors
3
1.4
Error analysis and control
6
1.5
The evaluation of formulae on desk machines
11
1.6
Mistakes
15
Examples I
17
11. Iterative Methods, with Applications to
the Solution of Equations
2.1
Introduction
19
2.2
A simple iterative method
20
2.3
The Newton-Raphson iterative method
26
2.4
General aspects of iterative procedures
31
2.5
Real roots of polynomials
38
2.6
Errors when finding roots of polynomials
41
2.7
Bairstow's method for finding complex roots of
polynomials
46
Examples II
SO
III. Elementary Programming for Automatic
Computers
3.1
Introduction
54
3.2
Simple programs
56
3.3
Some programs involving iterative procedures
65
3.4
General comments
75
Examples HI
78
xi
xii CONTENTS
IV. Simultaneous Linear Algebraic Equations
4.1 Introduction 80
4.2 The method of successive elimination 82
4.3 Choice of pivots and scaling 87
4.4 Inherent error and ill-conditioned equations 92
4.5 A computer program for the method of successive
elimination 98
Examples IV 100
V. Matrix Methods
5.1 Matrix algebra 102
5.2 A compact elimination method for the solution
of linear equations 105
5.3 The inverse matrix 112
Examples V 118
VI. Eigenvalues and Eigenvectors
6.1 Introduction 123
6.2 An iterative method for finding the largest eigen-
value 129
6.3 The determination of subdominant eigenvalues
and eigenvectors 135
6.4 The iterative solution of linear simultaneous
algebraic equations 138
Examples VI 147
Index 1 54
Volume II.— DIFFERENCES INTEGRATION
AND DIFFERENTIAL EQUATIONS
VII. Finite Differences and the Approximate Repre-
sentation of Functions
VIII. Polynomial Interpolation
IX. Numerical Integration and Differentiation
X. Ordinary Differential Equations
XI. Partial Differential Equations
CHAPTKR 1
ACCURACY AND ERROR
§ 1.1. Introduction. The solution of a problem in
applied mathematics usually consists of numbers which
satisfy some kind of equation. In theory these numbers
may be specified exactly by the equation but even in the
simplest cases (of which x = x /2 can be taken as a trivial
example) it is normally impossible to write down an exact
decimal representation of the solution. The aim of numerical
methods is to provide practical procedures for calculating the
solutions of problems in applied mathematics to a specified
degree of accuracy.
We shall be constantly preoccupied with the estimation
and prevention of errors. This is merely recognition of the
fact that to produce a number which is the answer to a
problem is only half the aim of numerical methods. The
other half, and often the more difficult half, is to ensure that
the answer is correct within stated limits. Any numerical
answer should be accompanied by a statement of ils maxi-
mum error or its probable error.
Errors occur in computation in four ways:
(a) Round-off errors due to the need to represent
numbers in decimal notation, using a finite
number of decimal places.
(b) Mistakes.
(c) Errors due to the nature of the formulae used, e.g.
the need to represent a continuous function by
a table of numbers.
NUMERICAL METHODS 1
§1-2
(d) Some of the numbers which must be used in the
computation may have inherent error, or may be
known only approximately, e.g. physical data.
It is a most important principle in numerical work that
all sources of error must be borne constantly in mind.
§ 1.2. Rounding off. The numbers 0-745 25 and
0-007 452 5 are given to five and seven decimal places
respectively. The number of significant figures is defined
as the number of digits that are assumed to be correct,
starting at the left with the first non-zero digit, and proceed-
ing to the right. Thus if all the non-zero digits in the above
numbers are correct, they both possess five significant
figures. Zeros to the left of the first non-zero digit are not
significant and serve merely to mark the decimal point.
Zeros to the right of the last non-zero digit are regarded as
significant. Thus 0-4300 possesses four significant figures.
Ambiguity occurs with a number like 4300. This can be
written 4-300 x 10 3 if it is desired to indicate that it is
correct to four significant figures. 4-30 x 10 3 is correct to
three significant figures.
If only certain digits in a number arc correct the
superfluous digits can be discarded by rounding off. The
rule for doing this is as follows. To round off a number to
n significant figures retain n significant digits and discard
the remainder. If the discarded number is less than half a
unit in the nth place leave the iith digit unchanged : if it
is greater than half a unit in the wth place add unity to the
;ith digit: if it is exactly half a unit round off to the nearest
even digit. The object of the last part of the rule is to
ensure that when exactly half a digit is discarded, numbers
will be rounded up and rounded down in equal proportions,
on the average. The only reason for rounding to even
rather than to odd digits is that the resulting number is
divisible by two without ambiguity. If 7-4525 is rounded
1.3
ACCURACY AND ERROR
off to 4, 3, 2 figures we obtain 7-452, 7-45, 7-5. But note
that if 7-45 is rounded off to two significant figures, without
further information, we obtain 7-4.
§ 1.3. Absolute and relative errors. If n is an approxima-
tion to a number N which is defined exactly, the error c of n
is defined by N = n + e. (Some writers use the definition
n = N+e but the distinction is not important though it is
better to be consistent.) The absolute error is | e | and the
(absolute) relative error is | e |/| N |. We shall be interested
in comparatively rough estimates of the error and we assume
that the relative error is small. In this case the relative
error can be taken equal to | e |/| n | which has the advantage
that the denominator is known.
It is often possible to establish an upper limit for the
magnitude of the error. For example if a number is given
to exactly m decimal places then | c | g0-5 x 10~ m . We call
this upper limit for the magnitude of the error the maximum
error and denote it by e. Thus | e | ^e. The relative error
is less than e/\ N\xe/\ n | (assuming that e is much less
than | n |).
Let 7i,, » 2 be approximations to N x , N 2 with errors
e,, e 2 , so that
Ni = ;ii+e,, N 2 = 71, + e,.
Consider first addition and subtraction:
#,+^ = (71, +71,) + (£,+£,),
JV l -/V 2 = (77,-71 2 ) + (E 1 -E,).
then \Ei+e 2
If
<e
2>
<e,+e
2>
.£i \<e t ,
I E i — s i l <e i + e 2- Hence if two numbers are added or
subtracted, the maximum error in the sum or difference is
the sum of the maximum errors in the two numbers. For
multiplication we have
N,N 2 = 71in 2 + 71 2 E 1 +/! 1 £ 2 + £ 1 £ 2 .
NUMERICAL METHODS
§1-3
§1.3
ACCURACY AND ERROR
If we denote n x ii 2 by n, and the error in n by e,
c = #i 2 e, +ll 1 «a, i-e. (e/n) = (e,//ii) + (s 2 /M 2 ).
where a second order term £,£2 has been neglected. Hence
the relative error in a product is the sum of the relative
errors in the two numbers which have been multiplied
together. In a similar way, for division,
N,/N 2 = (w 1 + £,)/(w 2 +fi 2 )
« /ir'("i+ei)(l-C2/"2)
» (ni/n 3 )+(«i/»a)-(»i*a/ B a).
where we have used the binomial theorem and neglected
second-order terms. Introduce n = /j,/« 2 with error e.
Then
(fi/n) = (e, //;,)- (e 2 /« 2 ).
Wc deduce from these results that for both multiplication
and division the maximum relative error in the result is the
sum of the maximum relative errors of the given numbers.
Absolute error is connected with decimal places and is
important when adding and subtracting. Relative error is
connected with significant figures and is important when
multiplying and dividing.
We consider some numerical examples which illustrate
and extend these remarks. It will be assumed that the
numbers with which we start are correct to the last digit
stated. Then a number with approximate representation
3-24, say, must lie between 3-235 00... and 3-24499....
A convenient way of expressing this roughly is to say " 3-24
is correct to 1 in 600 ".
The sum 49-84+015 = 49-99 is correct to 1 in 5000
even though one of the numbers is correct to only 1 in 30.
The sum 0-4984+15 = 15-4984 is correct to only 1 in 30.
The difference 49-99-49-84 = 015 is correct to only 1 in 15
although the original numbers are both correct to 1 in
10,000. The loss of significant figures in this way is one of
the most insidious sources of error in numerical work.
It is often possible to avoid errors due to the subtraction of
two nearly equal numbers by transforming the formulae
used in the computation, as illustrated in exs. 1.1, 1.2
below.
The sum S = 15-9+ 13-444+10-7246 = 400686 is
accurate only to +005. We can discard the last two digits
in the answer, writing S = 4007±005. It would be
slightly misleading to write the answer as 401, but this is
of course permissible if it is recognised that the last digit
may be nearly one unit in error. Alternatively one could
write S = 40-0 7 where the notation indicates that the
number is accurate to within a few units in the depressed
digit. We could save ourselves labour if we rounded off to
two decimal places before adding : 5« 15-9 + 13-44+ 10-72
= 4006. This is not strictly speaking correct since the sum
is in fact 4007, but a unit in the second decimal place is not
important when we know that the result may be in error by
live units in this place. On the other hand it is not per-
missible to write 5= 15-9 + 13-4+10-7 = 400. The errors
we have introduced by rounding off are now comparable
with the inherent error already present so that we have
unnecessarily increased the error. The moral is that if we
wish to round oft* numbers before a series of additions and
subtractions, and the least accurate number is accurate to /;
decimal places, we should round off to n + r decimal places
where r is a suitable positive integer usually chosen to be
1 or 2.
When multiplying or dividing, if the numbers have
nearly the same relative error the relative error in the result
is approximately twice the relative error in the individual
numbers. If one number is more accurate than the other,
the relative error is approximately the same as the relative
error in the less accurate number. Thus 3-34 2 = 11-1556
is accurate to 1 in 300, and 3-34x3-335= 111389 is
NUMERICAL METHODS I
§1-4
accurate to 1 in 600. Although the less accurate number in
each of these examples is given to three significant figures
it would be misleading and inaccurate to round-off the
answers to three significant figures.
§ 1.4. Error analysis and control. When using a desk
calculating machine, round-off errors arc under the direct
control of the person doing the computation. It would be
tiresome to have to round off every single number in a
calculation to the appropriate number of significant figures.
On the other hand it is essential to employ round-off to
save unnecessary labour and keep the lengths of numbers
within reasonable limits. One difficulty is illustrated bv
the calculation 0014 63-=- 79-3 = 0000 184 5. To obtain
an answer correct to 1 in 1000 it is necessary to carry seven
decimal places in the answer. On the other hand if we can
scale our variables so that the calculation reads 1 -463 -i-O-793
= 1-845 we need carry only three decimals. It is desirable
to arrange, by scaling or otherwise, that in the body of the
calculation all numbers are rounded to n+ 1 or //+ 2 decimal
places and that this then gives an answer correct to n
decimals. A more elaborate example of scaling is given in
§4.1, Ex. 4.1.
A similar situation can arise in connection with auto-
matic digital computers. These machines perform essentially
the same arithmetic operations as a desk calculator — they
add, subtract, multiply and divide numbers stored inside
the machine (see § 3.1). Analysis of round-off errors will
depend on the way in which numbers are stored. If a
fixed-point system is used this means that numbers are
represented with a fixed number of decimal places e.g.
00015, 7931482. If a floating-point system is used numbers
are represented with a fixed number of significant figures
e.g. 1-463 xlO -3 , 7-931 x 10 2 , where we have placed the
decimal point immediately after the first significant digit.
The scaling difficulty mentioned in the last paragraph may
§1.4
ACCURACY AND ERROR
be particularly important if calculations have to be per-
formed in a fixed-point system.
In hand computing it is possible to vary the number of
decimal places being carried in the light of the results
obtained as the calculation proceeds in order to obtain an
answer of the required accuracy with the minimum of
labour. In automatic computing the number of decimals
depends on the design of the machine and it may be difficult
to predict beforehand the course of a calculation. It is
often easier to make the machine compute as accurate an
answer as possible, subject to round-off and other errors,
and then determine the accuracy that has been achieved,
rather than try to produce an answer of specified accuracy.
It is important to realise that calculations can often be
arranged so that the effect of round-off errors is minimised.
This is illustrated in the following two examples.
Ex. 1.1. Find the roots of x 2 -26.x + 1 = 0.
We have x = 13±7l68 = 13± 12-961. Hence the
roots are .v, = 25-961, x 2 = 0039. The second root has
been obtained to only two significant figures even though
the square root was calculated to five figures. However,
instead of calculating the second root in this way we can use
the relation x t x 2 = 1, so that .v 2 = 1/25-961 = 0038 519,
in error by at most one unit in the last digit.
Ex. 1.2. Evaluate
c =
bA-aB
b-a '
(1.1)
withb = 0-485,a = 0-231,5 = 6-327 19, A = 6-322 81,
where the data are accurate only to the number of figures
stated.
If we use the formula for c as it stands, it might seem
that the result will be accurate to only three significant
figures since the denominator is accurate to only 1 in 250.
8 NUMERICAL METHODS I
However, if we rearrange the formula in the form
c = A-
a(B-A)
b-a
81.4
(1.2)
we find, on substituting numbers and carrying out a rough
error analysis, that c = 6-318 83, correct to within two units
in the last digit quoted. If we substitute directly in the
original formula (1.1) and carry sufficient places in the
intermediate working we find
c = 1-604 982/0-254 = 6-318 83.
This is the same answer as before, accurate to two units in
the last digit quoted, even though we know that the
denominator is accurate to only 1 in 250. The reason is
clear on performing an error analysis algebraically. Suppose
that the exact value of a is given by a + da, and similarly for
b, A, B, c. Then on neglecting second order quantities it is
found that
bA-aB
b-a
6c = (b + 5b)(A + 5A)- (a+5a)( B+ SB)
(b-a) + (db-5a)
*{(B- A)(aSb - bSa) + (/> - a)(.l>5A -aS B)}(b -a)' 2 .
Terms involving A and B have cancelled to give a final error
which depends only on (B- A), SA, and 5B, all of which are
small. Substitution in this formula again gives the maximum
possible error as two units in the fifth decimal place.
Formula (1.2) is more convenient than (1.1) since the
error can be estimated more directly from it, and since fewer
decimal places need be carried in the computation to obtain
a given accuracy in the final answer. This is an important
example of the way in which a formula can be rearranged to
give a more suitable formula for computation.
Error analysis in terms of maximum error, as discussed
in the last section, is useful in avoiding particular sources
of error, but in a long calculation an error analysis worked
§1.4
ACCURACY AND ERROR
out on the basis of the maximum possible error will over-
estimate the errors which arc likely to occur in practice by
a very large factor. For instance if the maximum absolute
error of a single result is e then the maximum possible error
that can be obtained when adding n numbers is ne, though
this is very unlikely to occur in practice. In fact if errors
are randomly distributed between ±e, there is only one
chance in 20 that the error in the sum of n numbers will
exceed 2(/j/3)*e, assuming that the errors in the n numbers
are independent. There is only one chance in 370 that the
error will exceed 3(/;/3)*<>, and this number is much less
than ne if/; is large. More precisely, the following results
are given by statistical theory.f The details lie outside the
scope of this book. If errors are distributed around zero
according to a frequency distribution y = /(*), the variance
of the error in a single number is defined to be
= ' .v 2 /(.v)r/.v,
J —00
f(x)dx = 1
The variance of the error in the sum of n numbers is no 2
and the standard deviation of the error of the sum is the
square root of the variance, or /iV. The distribution of the
sum tends to the normal or Gaussian form as n increases.
Thus there is only one chance in 20 of obtaining an error
of more than twice the standard deviation. As an applica-
tion of these results suppose that errors are uniformly
distributed between ±0-5, so that/(.v) = 1, -0-5< v<0-5
and /(.v) = for I x | >0-5. Then
, J* " 5
J-0-5
x 2 dx = -£.
The standard deviation of the error of the sum of n numbers
is therefore (n/12)*. On adding 100 numbers the standard
t A. C. Ailken, Statistical Mathematics, Oliver and Boyd (1942)
PP. 35, 62.
10
NUMERICAL METHODS I
§1.4
deviation is approximately 3 and if the errors are randomly
distributed it is unlikely that the total error will be more
than ±6. It is very unlikely that the error will be more
than ±10. In contrast, the maximum possible error is
+ 50. Statistical considerations are particularly important
in connection with automatic computers, where calculations
may involve an extremely large number of operations.
From an analogy with noise in an electrical circuit,
round-off errors which occur at random can be regarded as
" noise " superimposed on an exact calculation. Some
interesting comments on round-off noise are given in R. W.
Hamming, Numerical Methods for Scientists and Engineers,
McGraw-Hill (1962), Chapter II. It is also pointed out in
this chapter that in many circumstances the size of the
leading (i.e. most significant) digit in the numbers of a
calculation may tend to be small. Thus if two numbers
are multiplied together, each being chosen at random from
1, 2, ... 9, then by examining the leading digits in the 81
possible products it is easily seen that it is probable that
the leading digit will be a number with a small leading digit.
A theoretical analysis of error, statistical or otherwise,
is invaluable when comparing the relative efficiencies of
various methods for solving a given type of problem. Even
when it is possible to estimate errors within useful limits,
however, we are usually interested in the actual error in
the answer to a specific concrete example rather than the
average or maximum error when the method is applied to a
class of examples. For this reason we shall not devote
much space in this book to the determination of maximum
or average errors in computing procedures. Instead the
following approach is adopted. After a general analysis
of the sources of error in a given method, the numerical
procedure is arranged so that an estimate of the accuracy
of the answer can be obtained empirically from the results
obtained during the course of the calculation. Some of
the ways in which this can be done will be explained later in
§1.5
ACCURACY AND ERROR
11
connection with particular examples, but the general
principle is stated here to emphasise that from the point of
view adopted in this book the object of theoretical error
analysis is practical error control.
§ 1.5. The evaluation of formulae on desk machines. It
will be assumed in this section that the reader is familiar
with the operation of an ordinary desk calculator possessing
a setting register, a multiplier register, and an accumulator.
For example, in forming the product ab the number a is
inserted in the setting register. When this number is
multiplied by b the number b appears in the multiplier
register and the product ab in the accumulator.
Without going into detail we emphasise briefly several
aspects of the use of such machines, with particular reference
to the way in which good machine technique can prevent
the occurrence of mistakes. An elementary example occurs
in connection with reading negative numbers off the machine.
Negative numbers appear in the accumulator as comple-
ments, for example 99 . . . 994 732 represents the number
- 5268. In reading such a number off the machine, the
safe procedure is to set the complement, calculated mentally,
(namely 5268), in the setting register; add into the
accumulator; check that the accumulator then reads zero,
and record the number in the setting register. Another
example occurs in connection with transferring numbers
from the accumulator to the setting register. The number
which is entered in the setting register should be checked in
an obvious way by subtracting it from the contents of the
accumulator.
On the whole it is a good idea to do as much work as
possible on the machine. For instance unnecessary
recording of numbers should be avoided. Thus axbxc
and axb+c can be calculated without recording inter-
mediate numbers. Similarly expressions like
a 1 b l +a 2 b 2 +a 3 b i + ...
12
NUMERICAL METHODS I
§1.5
can be calculated without intermediate recording. In this
type of calculation (as in any calculation involving a
repetitive procedure) it is essential to fix the positions of the
decimal points in the setting register, the multiplier register,
and the accumulator, subject of course to the condition that
10x10 = 10.
Care has to be exercised when transferring numbers
from the work sheet to a calculating machine or vice-versa,
or from one part of the sheet to another. Common errors
occur from transposing digits (e.g. 8452 may be transferred
as 8542), or from repealed digits (e.g. 8455 may be trans-
ferred as 8445).
Wherever possible the work should be reduced to a
routine. Tabulation is a great help in this respect. It is
usually best to arrange a tabulation so as to do only one
thing at a time. Differencing procedures are often useful
in checking (§ 7.3).
Ex. 1.3. Find f (I -518) by linear interpolation, given that
/(I -300) = 0-752 15, /(I -600) = 0-746 39.
Also find the value of x for which f(x) = 0-750 61.
If we wish to evaluate y = f(x) by linear interpolation
given y { = f(x t ) and y 2 =/(v 2 ), we mean that we wish to
evaluate y assuming that the point (.v, y) lies on the straight
line joining (.v,, >•,) and (x 2 , y 2 ). It is easily verified that the
required formula is
y = {(.v, -.v)>' + (A-.v )j' I }/(.v 1 -.v ). (1.3)
For computational purposes it is desirable to rearrange this
formula as
}' = }'o+P()'i - >o). P = (v--v )/(.v, -.v ). (1.4)
(This has been explained in Ex. 1.2. Compare (1.1), (1.3)
and (1.2), (1.4) with b = (a,-*), a = (.y -a), D = y„
A = y ).
§1.5
ACCURACY AND ERROR
13
In this example h = 0-21 8/0-300 = 0-727, to three
figures. A safe procedure on a calculating machine is as
follows. Set 0-752 15. Multiply by 1000, where we use
three decimal places since // is specified to three places.
Clear the multiplier register. Calculate y l —y mentally
(-0005 76). Set 0005 76 and turn the handle once
backward (since }\-y is negative). Check that the
accumulator now reads /(1. 6) = 0-746 39. This checks the
setting of /(I -3) and the mental calculation of y t — y , in-
cluding its sign. The multiplier register now reads 1 000.
Turn the handle until it shows 0-727. The number in the
accumulator is /(l -518) = 0-747 96, rounded to five
decimals.
To answer the last part of the question, which involves
" inverse linear interpolation ", turn the handle till the
accumulator shows the number nearest to 0-750 61. The
number in the multiplier register is then 0-267. The required
value of a- is therefore 1 -300 + 0-3 x 0-267 = 1-380, to three
decimal places.
Ex. 1.4. Evaluate the following polynomials to four decimal
places for x = 0(0-2)1-0:
/,(*)== a-J.y 3 +.;.y 5 ,
f 2 (x) = 0-995 36v- 0-288 69.v 3 + 0-079 34.* 5 ,
/j(.v) = 0-994 95.V- 0-287 06.v 3 +0078 04.v 5 .
Compare the results with the corresponding values of
f(x) = arc tan .v.
The notation x = a(p)b means " from the value x = a
to the value x = b, inclusive, at intervals of p in a- ". Since
four decimal places are required in the answer, five decimals
should be carried in the working. We write
/.(*) = ix(3-x 2 +0-6.x 4 ).
The values of/,(.v) can be found systematically by means
14
NUMERICAL METHODS
§1.5
§1.6
ACCURACY AND ERROR
15
of a tabulation with the following headings:
i*
/i(x)
The value of $x is calculated mentally. The values of x 2
and .v 4 can be obtained from Barlow's Tables j The value
°Ui(x) is obtained in one desk machine operation involving
two additions and two multiplications. Three numbers
are recorded in addition to x and/,(.v). It is convenient
to set .v 2 , x* on the machine and to use 0-6 and J.v as
multipliers.
For /,(.v) we can proceed similarly except that it is
slightly more convenient to write
f 2 (x) = a(0-995 36-0-288 69.v 2 + 0-079 34a 4 )
and the table heading g# is now omitted. The evaluation of
f 2 {x) is carried out in one machine operation involving
three multiplications and one addition. Two recordings arc
required besides x and/ 2 (.v). It is convenient in this case to
set the five decimal place coefficients in the setting register,
and multiply by .v 2 , x*.
If we had to evaluate /,(*) for .v = 0(0-171)0-5*, say,
then .v 2 , .\- 4 would have to be computed on the desk machine
so that the procedure in the last paragraph would require
five multiplications and one addition. In this case we
should probably prefer to use nested multiplication:
a lX + a 3 x 3 + a s x 5 = x^+ufai + fls")}.
where u = a 2 . This requires four multiplications and two
additions. Only one recording is required besides a- and
/■>(*)• In hand computing, the number of decimal places
and significant figures needed at various stages of the
calculation have a decisive effect on the organisation of the
calculation. In an automatic computer all numbers are
treated alike irrespective of the number of significant figures
t Edited by L. J. Comrie, E. & F. Spon (London) (1952).
I
they contain, and the method used for evaluating a poly-
nomial will not depend on whether the coefficients or the
values of x arc simple.
The polynomial/, (a - ) consists of the first three terms of
the Taylor scries expansion of arc tan x. The accuracy of a
truncated Taylor series decreases with increase in the dis-
tance from the point about which the function is expanded.
This is illustrated by the numerical results in Table 1.1. The
polynomial / a (x) gives a much better approximation over
the whole range of a- although it is not so accurate as/,(.v)
Table 1.1
Results for Exercise 1.4
.V
0-2 0-4 0-6
0-8
10
fix)
01974 0-3805 0-5404
0-6747
0-7854
/(-v)-/,(.v)
-2 -32
-201
-813
M-fM
+ 6 -6
+ 2
-6
f(x)-Ux)
+7 +1 -6
+ 1
-5
near x = 0. The polynomial / 3 (.v) is derived in Ex. 7.5 by
truncating a Chebyshev series. It is almost as good a
representation of arc tan x over the whole range as the
polynomial f 2 {x) which is the optimum polynomial in the
minimax sense explained in § 7.5. (See also Ex. 3.6 and
the reference given there.) Polynomial representations like
fi(x),fi(x) are particularly useful in automatic computing.
§ 1.6. Mistakes. The main part of this section is
concerned with mistakes due to faulty computing technique
when using a desk calculating machine. No matter how
carefully a computation is performed there is always some
risk that mistakes will occur and it is essential to include
checks throughout a calculation to ensure that mistakes
will be detected as quickly as possible.
16
NUMERICAL METHODS I
§1.6
Experience is the best teacher in showing the beginner
where mistakes are likely to occur and how mistakes can
be avoided. It is sufficient to mention some of the more
obvious sources of error— illegible ligures, untidy arrange-
ment (particularly when figures are arranged in columns),
unsystematic work, trying to do too many steps at once.
To some extent computing technique will depend on the
individual and the following notes represent personal
preferences. It is convenient to work in ink on paper ruled
in rectangular boxes, or on squared paper with numbers
inserted in suitable boxes formed from the squares. A
mistake is corrected by drawing a line through a wrong
number and inserting the corrected value above the wrong
one. This is to be preferred to the use of pencil and rubber
for various reasons. Use of ink encourages accuracy. It
is useful to know when mistakes have been corrected, and
exactly what corrections have been made, partly because it
is easy to make further errors when correcting mistakes.
Apart from anything else the work-sheet will soon look
unsightly if too many mistakes are made and this is a sign
that the computer should take more care— or even start
again from the beginning.
A point that is often neglected is to include sufficient
explanation on the sheets— a calculation should be intel-
ligible a year later. It is advisable to include details of all
the steps in the calculation in sequence— scribbling in odd
corners of the sheet to obtain intermediate results is to be
discouraged. It is precisely in these intermediate results
that mistakes are likely to occur.
When an answer has been obtained it may be easy to
check whether it is correct, for example by substitution.
On the other hand the only method of checking may be to
repeat the calculation. (Straightforward repetition by the
same person is not satisfactory since there may be a tendency
to repeat certain kinds of mistake if a calculation is repeated
in the same form by the same person.) Between these two
§1.6
ACCURACY AND ERROR
17
extremes it is not unreasonable to ask that ten or twenty
per cent of the time spent on a calculation should be
devoted to checks. It is often convenient to arrange a
calculation so that a difference check can be employed
(ij 7.3). One has to be doubly careful when considering the
correctness of results obtained by means of automatic
computers since usually an automatic machine produces
answers without any detailed information about inter-
mediate steps. It is therefore very easy to be unaware of
loss of accuracy due to the particular way in which the
calculations have been performed inside the machine.
It is important to try to develop a kind of " numerical
common-sense " to detect obvious blunders and to keep a
mental check whether the results of a calculation arc turning
out as one would expect.
The length of time taken over a computation is of little
importance compared with the necessity for obtainina a
correct answer.
Examples I
Ex. 1.5. Perform each of the following calculations on
a desk machine without recording intermediate results.
State the accuracy of each answer and round off to the
appropriate number of significant digits:
40963- 53-41 -492-0+ 63-41 5,
(93-05x4-0732)-(62-l4xO-2154)-(676xO-0873),
2-4076 x 0049 1 8 x 0000 963,
(0-4362 x 0029 432) -=-0-007 44.
Ex. 1.6. Find 739-^/38 to five significant figures,
given v /39 = 6-2450, ^38 = 6-1644.
Ex. 1.7. A change of 8x in x produces a change of
approximately (df/dx)8x in f(x). Hence show that if in
18
NUMERICAL METHODS I
§1.6
the determination of cot 0128 the argument is correct to
the number of figures stated, the cotangent is indeterminate
to within 003. Check this result by finding from tables
the values of cot 0-128 and cot 0-1285. Similarly find the
degree of determinacy in the following quantities:
sin 1-560, cos 1-560, tan -1 150, log 1-083, cxp (-2031).
Find also the number of significant figures in each quantity
and check your results from tables. (The point of this
example is to remind the reader that when y = f(x) is read
from a table the number of significant figures in y is not
necessarily the same as the number of significant figures
in .v.)
Ex. 1.8. Evaluate (cosh 0-cos 0)/(sinh 0-sin 0) to
five significant figures for - 0-3741 by expanding
appropriate terms as power series in 0.
Ex. 1.9. Distinguish between fixed- and floating-point
calculations and compare the maximum errors involved in
the two systems when computing (i) the sum of n numbers
(ii) the product of// numbers (iii) the sum of// products of
the form a ( b,.
CHAPTER M
ITERATIVE METHODS. WITH APPLICATIONS
TO THE SOLUTION OF EQUATIONS
§2.1. Introduction. In this chapter we introduce an
important class of numerical procedures, namely iterative
methods. These are illustrated by describing typical
iterative procedures for solving transcendental and algebraic
equations. (Equations like cot x = 2.v, x In x = 4, con-
taining transcendental functions, arc called transcendental
equations. A simple algebraic equation is .v 3 = 3.V+2.
All of these equations can be written in the form/(.v) = 0.)
The first step in an iterative solution of an equation
/(.v) = is to locate the roots approximately. An easy
method for locating any real roots is to draw a graph of the
function y = f(.\) and find the points where the curve cuts
the .Y-axis. Sometimes it is convenient to write f(x) =
in the form/,(.v) = f 2 {x) and to plot two functions;- = /,(.v)
and y = /,(.v). The real roots of/(.v) = are given by the
abscissae of the points of intersection of the two graphs.
Various methods are available for the numerical solution
of equations and it lies outside the scope of this short book
to describe and compare even the more important of these.
It is not implied in this chapter that the iterative methods
which we discuss are to be preferred to any other methods
for solving equations numerically. If an equation is
complicated and results are required to only a few decimal
places there may be little point in using, for instance, a
rapidly convergent iterative method like the Newton-
Raphson process of § 2.3. It may be much more convenient
19
20
NUMERICAL METHODS I
§2.2
§2.2
ITERATIVE METHODS
21
and satisfactory to use repeated plotting on successively
enlarged scales (compare the last paragraph) or the method
of false position (Ex. 2.12). On the other hand, when
finding the complex roots of high-order polynomials it may
be desirable to use sophisticated methods such as those of
Graeffe or Bernoulli.
§ 2.2. A simple iterative method. Consider the equation
a- 2 -5.y+2 = (2.1)
which has the roots a, ji, say, where
a = 2-5-74-25 = 0-438 447,
ft = 2-5 + 74-25 = 4-561 553.
In order to illustrate a simple iterative procedure we
shall solve (2.1) in the following way. The first step is
to localise the roots approximately. By evaluating
y = .v 2 -5a-+2 for x = 0, 1, 2, ... we find that (2.1) has
one root between and 1, and one root between 4 and 5.
We then rearrange (2.1) in the form
.v = 0-2.v 2 +0-4.
This equation can be solved iteratively by writing
*,+, = 0-2.y 2 +0-4, r = 0, 1, 2, ... (2.2)
where x r denotes the rth approximation to a root. We
start by choosing, say, a = as an approximation to one
of the roots of the quadratic. If this is substituted on the
right of (2.2) we obtain .y, = 0-4. This value is then
substituted on the right of (2.2) to give .y 2 , and so on. In
this way we find
a- 2 = 0-432, .Y 3 = 0-437 32, .y 4 = 0-438 25, (2.3)
where in the last two calculations wc have worked to five
decimal places. It is clear that, as r increases, x, is
approaching the smaller root a of the quadratic. If on the
other hand we start with a- = 10 we find
x t = 20-4, x 2 = 83-6, x 3 = 1398-2,
and as r increases x r increases without limit. Before
considering the reasons for these results we give some
definitions.
A calculation of the form
-Y r+1 = F(x,), r = 0,1,2,...,
(2.4)
where we start from an assumed value for a , and then
calculate x b x 2 , ... in succession, as indicated in the
example above, is called an iterative procedure. If as r
tends to infinity x r tends to a definite value then the pro-
cedure is said to converge. If as r tends to infinity x r does
not converge (for example if x r tends to infinity) then the
procedure is said to diverge.
We examine theoretically the convergence of the iterative
procedure (2.4) by a method which will be useful in later
sections. On setting x = x t +&x„ where a- is an exact root,
equation (2-4) becomes
a— <5A- r+1 = F(x-8x r ).
From (2.4), if the procedure converges,
X m f(x),
(2.5a)
(2.5b)
On expanding the right-hand side of (2.5a) in a Taylor
series and using (2.5b) we find
5x, + l = 5x r F'(x)-l-5x?F"(x) + ....
(2.6a)
In this section we assume that F'(x) is non-zero. When
the difference between x and x, is small we can ignore the
higher-order terms on the right of (2.6a) which then gives
dXf+t ss 5x r F'(x).
(2.6b)
22
NUMERICAL METHODS I
§2.2
Repeated application of this formula gives
Hence if | F'(x) | < 1 we obtain the result that Sx r tends to
zero as r tends to infinity i.e. the iterative procedure
converges.
A more rigorous derivation is the following. Instead
of the infinite Taylor series which led to (2.6a) we use the
Taylor series with remainder (or equally the mean value
theorem for derivatives)!:
F(x-6x r ) = F(x)-dx r FXZ),
where f lies in the range x to .v r = .v-<5.\- r . It is assumed
that F(£) exists in this range and that F(£) is continuous in
the range including the end-points. Then (2.5a, b) give,
instead of (2.6),
<5.v r+1 = 5x,F'(0
where I; lies between x and .v r . Suppose that
I F'(0 | < M (2.7)
for the range of g in which we are interested, where M is a
constant independent of r. Then
|5.v r+1 |<M|«5x r |<M p+1 |^o|. (2.8)
If M< 1 then Sx, tends to zero as r tends to infinity. A
sufficient condition for convergence of the iterative pro-
cedure (2.4) is therefore that | /•"'(£) | < 1 for { near the
required root.
It is instructive to illustrate this result graphically as in
Fig. 2.1(a-d) where we have drawn y = x and y = F{x) for
several different functions F{x). Starting from an arbitrary
value x of x on the line ;' = x we move vertically to the
curve y = F(x) and then horizontally back to the line
y = x. From (2.4) this gives a,. Repetition of the pro-
cedure gives .Y 2) .r 3 , ... in succession. It is easy to see whether
t J. M. Hyslop, Real Variable, Oliver and Boyd (I960), § 6.2.
§2.2
ITERATIVE METHODS
23
the procedure converges or not. From the diagrams in
Fig. 2.1 it is clear that convergence to the required root can
occur only in cases (a) and (b), namely when | F'(x) \ < 1
and the first approximation is sufficiently near the required
root.
Even if there are mistakes in the intermediate steps the
above iterative procedure will give the correct final answer
provided that the iteration ultimately converges to the
required root (cf. Ex. 2.14). On the other hand mistakes
may delay convergence or cause convergence to another
root. One simple check that should be applied mentally
in hand computation is that, as the calculation proceeds,
the x r tend to a limiting value monotonically as in the case
of Fig. 2.1(a), or they lie alternately on either side of a
limiting value as in the case of Fig. 2.1(b).
We now return to consideration of the iterative pro-
cedure (2.2) for which F(x) = 0-2a- 2 +0-4, F'(x) = 0-4.v.
There are two roots a«0-44 and /J%4-6 so that | F'(a) \ < 1
and | F'O?) | >1. Hence the above theory tells us that if
we try iterating with an initial approximation near a the
procedure will converge to this root but if we start with an
initial approximation near/? the procedure will not converge
to p. The iteration can be examined graphically as in
Fig. 2.2(a). Proceeding as for Fig. 2.1. it can be shown
that if we choose an initial approximation .v such that
\ x o\<P then the iterative procedure will converge to the
smaller root a, but if | .v | >fi the procedure will diverge.
There are of course a large number of ways in which
we can rearrange an equation/ (a) = in the form x = F(x)
and each rearrangement will give an iterative procedure of
type (2.4) with, in general, different convergence properties.
Thus if we rearrange (2.1) in the form .v = 5-(2/x) we
have the iterative formula
- v r+i — 5 .
24
NUMERICAL METHODS I
§2.2
In this case F'(x) = 2/.v 2 so that | F'(x) | > 1, | F'(j?) | < I,
and the procedure will converge to the larger root /? if we
start with an initial approximation near the larger root but
it will not converge to the smaller root a if we start with an
approximation near to a. The situation is illustrated in
(a) 0^F%x)<l
y-- n.o ,
Fio. 2.1
(c) F%\)>\ (d)F(.v)<-l
Diagrams illustrating the iterative procedure .v r| , - F(.v r )
Fig. 2.2(b) from which it is clear that the procedure ulti-
mately converges to the larger root whatever first approxi-
mation x is chosen.
Still another example is illustrated in Fig. 2.2(c) which
shows the iterative solution of the trivial equation x = -x 3 .
The procedure converges to the root x = if we choose
I a- | < 1 , but diverges if we choose | .v | > I .
3 2.2
ITERATIVE METHODS
25
The discussion given above has illustrated the following
points in connection with iterative procedures:
(i) If we wish to obtain a specific root of an equation
it may be necessary to try various iterative
methods obtained by rearranging the equation
f(x) = in the form x = F(x) in different ways
until we obtain a sequence which converges to the
required root.
(a) xr+i - 0-2*»0-4 (b) .v,+ i - 5-(2/.v,) (c) »+l - -x}
Fio. 2.2 Examples of the iterative procedure x r + 1 = F(x r )
(ii) It is important from a practical point of view that
convergence should be as rapid as possible. For
rapid convergence of the procedure (2.4) discussed
in this section it is clear from (2.8) that | F'(v) |
should be as small as possible, and .v should be
chosen as near to the required root as possible.
(iii) In considering whether an iteration is converging or
not it may be necessary to ignore the first few
iterations since the procedure may appear to
diverge initially, even though it ultimately
converges.
(iv) In the iterative procedure considered in this section
the correct final answer is obtained even though
26
NUMERICAL METHODS
§2.3
there are mistakes in the intermediate steps,
provided that the iteration ultimately converges to
the desired root.
The rate of convergence of iterative methods of the type
discussed in this section tends to be rather slow, and in
practice a device is used to accelerate the convergence, as
discussed in § 2.4 below.
§ 2.3. The Newton-Raphson iterative method. Let X be
an approximate root of f{x) = and suppose that the
exact root is x = X+f,, so that h is a small quantity. Then
f{X+h) = and, expanding by Taylor's theorem, we have
AX + h) = = AX) + hf'(X) + \h 2 f"{X) + .... (2.9)
If /; is small and f(x) is a well behaved function in the
neighbourhood of the required root, the terms in I, 2 , ft 3 , ...
can be neglected, so that
The Newton-Raphson iterative formula is obtained by
writing this result as an iterative procedure:
*""*-$$ <2J0)
where, as in § 2.2, we start with an estimate .v and calculate
in succession .v,, x 2 , ....
The geometrical interpretation of the Newton-Raphson
method can be seen from Fig. 2.3(a). Suppose that the
curve /(*) = cuts the .v-axis at A, so that OA = x. If
-v, = OC, then CD «/£g and tan CBD =/'(*,)• Also
BC = CD cot CBD. Hence
OB . OC-BC = .W(.v r )//'(.v,).
so that OB = .v r+I , from (2.10). The diagram indicates
that once the procedure starts to converge towards a root,
§2-3
ITERATIVE METHODS
27
it will converge from one side. Under exceptional
circumstances this may not be true, for instance if/" = 0,
(/"//') <0 at the root. Also the first few iterates may
oscillate on either side of a root, as in Fig. 2.3(b).
Difficulties may arise if/'(v) = near a root of/(.v) =
which it is required to determine, particularly when the
equation has two nearly equal roots in this region. These
difficulties can often be clarified by plotting the curve
(a) (b)
Fio. 2.3 Convergence of the Newton-Raphson iterative procedure
y = /(•*) for values of x near the required root and ex-
amining the geometrical interpretation of the Newton-
Raphson procedure. Thus in Fig. 2.3(b) if we start from
the abscissa P we certainly do not converge to either of the
roots at E, F though of course the procedure converges to
another root of the equation. It is usually sufficient to
ensure that the .v r approach the required root of the equation
from the side opposite to that on which f'(x) = 0. If
/'(.v) is small near the required zero of/(.\) = then it may
be necessary to compute /(.v r ) and/'(-v r ) to a high degree of
accuracy in order to obtain an accurate estimate of *,+ ,.
This indicates that the equation is " ill-conditioned ". The
meanings of the words " ill-conditioned " and " accurate "
are discussed in connection with polynomials in § 2.6.
28
NUMERICAL METHODS I
§2.3
8 2.3
ITERATIVE METHODS
29
so that
To examine the rate of convergence of the Newton-
Raphson method we use the results in § 2.2, (2.4)-(2.6). To
apply these to the present case we must set
/"'(.v)=/(.v)/"(.x)/{/'(x)} 2 = 0.
F*(x) =f"(x)lf'(x)*0, in general,
where in both equations we have used the fact that/(v) =
Hence from (2.6a), since 6x, - x-x r ,
x -Xr + i = -ip(x-x r ) 2 + ..., where />= /"//". (2.11)
If x r is nearly equal to the required root x, the higher order
terms can be neglected and the error in A r+1 is proportional
to the square of the error in x r . This means that the rate of
convergence of the Newton-Raphson procedure is very much
more rapid than the rate of convergence of the simple
iterative method discussed in § 2.2. This is illustrated in
the following examples where the final estimate of x is
obtained by repeating the iteration until .v r+1 and x r agree.
Care may have to be exercised when using this criterion
when working to a fixed number of decimal places, as
discussed in § 2.6 below. Some general aspects of rates of
convergence, including estimation of the rate of convergence,
are discussed in the next section.
Ex. 2.1. Find the roots of .v 2 -5.v+2 = to five decimal
places by the Newton-Raphson procedure.
If we choose/ (a) = .v 2 -5.v+2 then (2.10) gives
x
r+l —
_ *?-2
2x r -5
On taking a = we find .v, = 0-4 and .v 2 = 0-438 095
* 3 = 0-438 447, x 4 - 0-438 447. The extremely rapid
convergence is evident. (Compare results (2.3) from the
procedure in § 2.2.) On starting from .y = 40 we find
X, = 14/3 and x 2 = 4-564 104, x 3 = 4-561 554, * 4 = x s
= 4-561 553. In this example it is convenient to work to
six decimal places throughout. In calculating x 2 it would
have been sufficient to have worked to three decimal places,
but we did not know this beforehand. In more complicated
examples it may be convenient to work to a small number of
decimal places in the earlier iterations until the procedure
starts to converge rapidly.
When using the Newton-Raphson procedure it may not
be necessary to recompute /'(.v) at each iteration. It may
be sufficient to use an approximate value of/'(v)- Thus in
the above example we can write
x r+\ — x r —
x f 2 -5x f +2
(2x r -5)* '
where the star denotes an approximate value. On choosing
x = we find a-, = 0-4. To obtain a simple formula we
set (2v,-5)* = -4 and use the iteration
.v r+1 - x,+l(x;-Sx, + 2) = i(A r 2 -A r + 2).
With .v, = 0-4 this gives
X a = 0-44, .v 3 = 0-4384, .v 4 = 0-483 449.
These figures should be compared with those obtained
previously. Although the rate of convergence is of course
not so rapid as in the true second-order procedure, it is
still remarkably good considering the crudeness of the
approximation used for 2a, - 5. Although we have approxi-
mated tof'(x) it is essential to calculate/ (a) to full accuracy.
It often happens in iterative processes that we can be
relatively crude in certain parts of the calculation, but
accuracy must be maintained in certain vital parts of the
calculation.
30
NUMERICAL METHODS
§2.3
Ex. 2.2. Find v /ll to five decimal places by an iterative
procedure.
We wish to solve the equation .v 2 = a. On writing
/(.v) = x 2 — a the Newton- Raphson formula gives
.v r+1 = x r -±(x r - -J = ±(x r + -
where the first form is convenient for hand computation
since x,-(a/x r ) is a small quantity. If we choose .v = 3
then
.v, = 3-33, x 2 = 3-316 65, x 3 = 3-316 62,
where x 3 is correct to five decimals. This is a very con-
venient method for finding square roots on a desk calculator.
Ex. 2.3. Solve x = exp (l/.v) to five decimal places.
Various formulae can be obtained for the iteration by
rearranging the equation in different ways. If we choose
/(.v) = x-exp (1/x) the Newton-Raphson formula gives
x, +1 = .v,-x^{x r -exp(I/.v r )}/{x 2 +exp(l/x r )}. (i)
If we write x = l/y so that/OO = 1 ->• exp y, we find
JV+ 1 = >V+{exp {-y r )-y,W +y r ). (ii)
If we take logarithms and write /(.v) = 1-x In x, then
*r+ 1 = v r +(l -x. In .v,)/(l +ln .v f ). (iii)
Forms (ii) and (iii) are more convenient than (i).
It is found graphically that there is only one root near
x = 1-8 (or>> = 0-6). With initial approximation y = 0-6
formula (ii) gives successively 0-568, 0-567 144 i.e.
x = 1-763 22. With initial approximation .v = 1-8 formula
(iii) gives successively 1-7634, I -763 22. The final iteration
should of course be repeated as a check.
5 2-4
ITERATIVE METHODS
31
§ 2.4. General aspects of iterative procedures. If an
iterative procedure for finding a quantity x by calculating
successively x,, x 2 , ... is such that
x-x r+1 = a k (x-x,) k + a k+l (x-x,) k+, + ... (2.12)
where a k is a non-zero constant, we say that the iterative
process is of the fcth-order. We see from (2.6b) that the
simple procedure in § 2.2 gives in general a first-order
process, and from (2.11) that the Newton-Raphson
procedure is second order.
The order of the iterative procedure x r+ , = F (x r ) for
finding a root x can be obtained by expanding F(x r ) in a
Taylor series around x, as in (2.6a). If the first non-zero
derivative of F(x) is of the Arth order then the iterative
procedure is also of the kth order, since (2. 1 2) and (2.6a)
then agree, with k\a k = (-l)* +1 F ( *>(x). If F(x) has a
simple form it may be easy to determine the order of an
iterative procedure by finding the order of the first non-
vanishing derivative, as in the following example. If F (x) is
complicated it may be simpler to obtain the expansion
(2.12) directly, as in the argument leading to (2.17) below.
Ex. 2.4. Show that x r+ , = x r (3-3ax r +a 2 x r 2 ) is a third-
order process for the calculation of I /a.
We have
F{x) = 3x-3ax 2 + fl 2 x 3 , F'(x) = 3(flx-l) 2 ,
F"(x) = 6a(ax- 1), F'\x) = 6a 2 .
The equation x = F(x) has three roots, namely x = 0,
1/a, 2/a. If x'= the process is first-order with F'(x) = 3 > 1,
and therefore the iteration will never converge to x = 0.
Similarly if x = 2/a the process is first-order with
F'(x) = 3 > 1 and the iteration will not converge to x = 2/a.
The remaining possibility is x = 1/a and then F'(x) = F"(x)
= 0, F"(x) yt 0, so that we have a third-order process for
the calculation of 1/a.
32
NUMERICAL METHODS
§2.4
§2.4
ITERATIVE METHODS
33
We shall use the following notation for the approximate
relations between the errors in x, and .v r + , for first- and
second-order procedures, respectively:
.v-* r+ , = M(x-x r ),
x-x r+l = N(x-x r ) 2 .
(2.13)
(2.14)
A second-order process is extremely rapidly convergent
once the error becomes small enough. If the error at any
stage of an iteration is S, the error in successive iterates in
a second-order process is
NS 2 , N 3 8\ N 7 S 8 , ... N~\NS) 2 \ ....
Hence the procedure is convergent if | NS | <1, and the
smaller | NS | is, the more rapid the convergence. The
term | NS | 2 " ultimately goes to zero extremely rapidly as
r increases, for any value of | NS | less than unity. If
NS = 01, then the sequence (NS) 2 , (NS) 4 , (NS) 8 , ... gives
001, 0-0001, 000000001, .... If NS = 0-9 we have,
approximately, the sequence 0-81, 0-656, 0-430, 0185,
0054, 0-0029, 0000 008 4 The convergence is initially
slow but ultimately rapid. Once the state of rapid con-
vergence is reached, each iteration will roughly double the
number of correct figures in the answer, but it is misleading
to state without qualification that second-order processes
double the number of significant figures per iteration. For
instance the iteration may be started with a value of x so
far from the required root that the second-order analysis
may not apply to the first few iterations. In this case the
initial convergence may be very slow. In fact one of the
problems in applying second-order processes is to arrange
that the state of rapid convergence is reached after as few
iterations as possible (compare Ex. 3.1, Program 3.5).
In first-order processes the behaviour of the error is
completely different from that just discussed in connection
with second-order processes. If the initial error is 5, the
errors in a first-order process are successively
MS, M 2 S, M 3 S, .... M'S, ... .
The error is reduced by a constant factor M in each iteration.
This means that there is no ultimate extremely rapid con-
vergence as in second-order processes.
Because of the comparatively slow convergence of first-
order processes it is often desirable to improve the rate of
convergence by the following important method. From
(2.13) we assume that, for the first-order procedure under
consideration,
X-x r =M(X-x r . l ),
X-x,^ = M(X-x r . 2 ),
where X is an approximation to the root. On eliminating
M and rearranging we obtain an expression for X in terms
of x r , .Y r _,, .v r _ 2 which will be, in general, a better approxi-
mation to the root than any of the iterates. We find
X = X r X r-2~ x r- i
X r — 2.X,_i+X r _ 2
= X,
(*,-*,-, ) 2
(2.15)
-x r -2.x r _,+x r _ 2 '
where the second form is more convenient for calculation
(compare Ex. 1.2). The use of (2.15) is usually called
Aitken's 5 2 -process since A. C. Aitken popularised its
applications in numerical analysis. It goes back at least
as far as Kummcr in 1837. As an example consider the
figures (2.3):
x 2 = 0-432, .y 3 = 0-437 32, .v 4 = 0-438 25. (2.16a)
Substitution in (2.15) gives
A' = 0-438 25 -(0-000 93) 2 /(- 0-004 39) = 0-438 45. (2.16b)
34
NUMERICAL METHODS I
§2.4
If we use this value as a new .v in (2.2) and start a new
sequence of iterations we find that the first iterate repeats
the value 0-438 45 so that the iteration converges to this
estimate of the root.
(It is convenient at this point to anticipate notation
introduced in § 7.2 in order to explain the name " 5 2 -
process ". Suppose that the numbers (2.16a) arc arranged
in a difference table:
P
x,
S
*r-a
0432 00
532
.Y,_,
0-437 32
93
Xr
0-438 25
-439
The quantity x r -x r _, (= 93) is the first difference Sx,^
and x r -2v,_ 1 +.v,_ 2 (= -439) is the second difference
(5 2 x r _ ,, in units of the fifth decimal place. These numbers
occur in (2.16b) above. The name " 6 2 -process " comes
from this connection with differences.)
This type of procedure, namely using several successive
iterates to obtain an improved estimate of the root, which
is then used as an initial approximation for a further
sequence of iterations is called accelerating the convergence.
It must be remembered that in the derivation of formulae
for accelerating convergence we assume that round-off
errors are negligible. The formulae will not be valid if
round-off errors are comparable with the differences between
successive iterates. Although wc stated at the end of
§2.2 that mistakes are usually not very important in
iterative procedures this is obviously no longer true when
accelerating procedures are used. An accelerating procedure
may make convergence worse if the assumptions on which
the procedure are based are not satisfied. Thus Aitken's
<5 2 -proccss is based on (2.13) which assumes a first-order
§2.4
ITERATIVE METHODS
35
procedure. Consider the following figures from Ex. 2.1
illustrating the Newton-Raphson second-order procedure:
.v, = 4-666 667, x 2 = 4-564 104, x, = 4-561 554.
The c5 2 -process indicates that .v 3 is in error by 0000 065
whereas the error is in fact only 0000 001.
To examine the convergence of Aitken's process,
suppose that a first-order procedure gives three successive
iterates x , x,, x 2 which are used to obtain a new initial
value x{,° by means of (2.15). From xj, 1 ' we deduce x ( ,°>
x 2 " by the first-order procedure and then use (2.15) to
obtain x{, 2) , and so on. From the definition of a first-order
procedure we have
X-X, = fl,(x-x ) + a 2 (x-x ) 2 + ...,
x-x 2 = a l (x-x l ) + a 2 (x-x 1 ) 2 + ...
= flj(x-x )+fl 1 a 2 (l + a,)(.t-x ) 2 + ...,
where a x , a 2 are constants. Hence
x 2 -2x,+x
= -(l-a I ) 2 (x-x ) + fl 2 (2-a l -a 2 )(x-x ) 2 + ...,
(x 2 -x t ) 2
= fl 2 (l- fll ) 2 (v--Vo) 2 + 2fl 1 a 2 (l-2fl l -fl?)(x-x ) J + ....
On substituting these results in (2.15) with r = 2 we find
X-x4" - C(x-xo) 2 , (2.17)
where C is a constant. Similarly, in general,
x-x£ +,) = C(x-x< ") 2 ,
so that Aitken's <5 2 -process for accelerating convergence is
second-order.
The question of whether to use a first-order process with
36
NUMERICAL METHODS I
§2.4
§2.4
ITERATIVE METHODS
37
acceleration of convergence, or the Newton- Raphson
second-order process, depends on the particular equation
considered. As a broad generalisation, iff'(x) has a simple
form then the Newton-Raphson procedure will be prefer-
able but in complicated examples the accelerated first-order
procedure may be simpler. Second-order processes are so
quickly convergent that it is seldom necessary to consider
third- or higher-order processes, although procedures of
arbitrarily high order can be constructed for any given
iterative problem.
When using a first-order iterative procedure it is often
useful to estimate the quantity M in (2.13), after each
iteration. By subtracting the equations
x-x r = M(x-x,-,), .v-.v f _, = M(x-x,- 2 ),
we find
M = *'"*'-' = -\ (2.18a)
X r _|— .X r _ 2 /
say, where /, the reciprocal of M, has been introduced since
it is usually more convenient to work in terms of / rather
than M. By inspection we see that equation (2.15) for the
<S 2 -process can be written in terms of t as
*-*+fZJ &-*-!>
(2.18b)
As an example the figures in (2.16a) give
t = 532/93 » 6,
X = 0-438 25+}(0000 93) = 0-4384 4 .
From (2.6) we see that, when solving the equation x = F(x),
the theoretical value of M is F'(x). In this example the
theoretical value of / is therefore t = \jM = l/(0-4x)%6.
However in practice it is useful to compute the empirical
value of t after each iteration since if the successive
estimates of t tend to a constant value this confirms that
^-extrapolation will be valid.
In a similar way when using a second-order procedure
it is useful to estimate N in (2.14) after each iteration.
Since a second-order procedure converges very rapidly we
estimate N by replacing x by x, in the equation
This gives
x-x,-i = N(x-x r - 2 ) 2 .
Nx(x r -x,- t )l(x,-x,- 2 ) 2 .
(2.18c)
Similarly to estimate x we assume that we can replace x by
x r on the right of the following equation :
This gives
x— x r = N(x— x p _i) 2 .
.x = x r +M-v,-* r -,) 2 ,
(2.18d)
where it will be assumed that the value of N is estimated by
means of (2.18c). In terms of the quantity t defined in
(2.18a) this formula becomes
(2.18e)
though now l will not tend to a constant value independent
of r as r increases. As a numerical example consider the
following figures from Ex. 2.1:
X, = 4-666 667, x 2 - 4-564 104, x 3 = 4-561 554.
Then
N ft) 0-0025/(0- 1) 2 = i,
x-x 4 « i(00025) 2 = 0-000 001 6,
which indicates that x t should be correct to within about
two digits in the sixth decimal place. Alternatively we
38 NUMERICAL METHODS I § 2.5
calculate
/ = 01/00025 = 40,
x-x 4 « 0-0025/41 2 = 0-000 001 6,
as before. The above formulae should not be applied until
the iterative procedure has started to converge systematically.
§ 2.5. Real roots of polynomials. The general poly-
nomial equation of degree n is (assuming a ¥= 0):
§2.5
ITERATIVE METHODS
39
/>„(.*) = a x" + a l x''- , + ... + a„ = 0.
(2.19)
The following properties of polynomial equations should
be borne in mindf:
(i) A polynomial equation of degree n has exactly n
roots. (A factor (.v— a)" gives a repeated root
which must be counted in times.)
(ii) If the equation has real coefficients, complex roots
occur in pairs. If p+iq is a root then p-iq is a
root. An equation of odd order has at least one
real root,
(iii) If the roots arc x t , x 2 , ...x„ then
-*,+x 2 + . .. + *„= -(a,/a ),
x l x 2 ...x„ = (-iy(aja ),
£x,x 2 ....v, - (-l) r (a r /a ), r = 2 to «-l.
In this equation the sum is taken over all
possible products of r distinct .v, chosen from the
n roots,
(iv) If one root is much smaller than the others then it
is given approximately by a n _ 1 x+a„ = 0.
Similarly the two smallest roots are often given
t H. W. Turnbull, Theory of Ft/nations, 5th Edition (Oliver and
Boyd, 1957).
approximately by the roots of a n ^ 2 x 2 +a n . 1 x + a H
= 0. If one root is much larger than the other it
is given approximately by a^x+a^ = 0. The
two largest roots are often given approximately
by a x l +a i x+a 2 = 0.
The Ncwton-Raphson formula can be applied to the
polynomial equation (2. 19). If u is an approximate root of
this equation then an improved root is given by
u-{P„(u)IP-(u)}.
(2.20)
It is desirable to systematise the calculation of P n {ii)jP' n (u).
We consider only real u. Suppose that division of P„(x)
by x-u gives
Then
Also
Hence
P n (x) = (x-u)Q n .,(x)+B.
P*(u) = B.
/ , n 'W = Q™-iW+(v-«)0:_ 1 W.
Suppose that division of Q n -i(x) by x-u gives
<? M -iW = (x-u)R n -,(A:)+D.
Then
Pnd') = Q n -i(u) = D.
From (2.20) the improved value of u is therefore given by
u-B/D.
The calculation of B by long-hand division, which also
■40 NUMERICAL METHODS I § 2.5
gives the coefficients of &-»(*)« proceeds as follows:
X-U) Oq
b -b u
b,
<>2
b } -b t u
"3
«3
(bo *i b 2
where b = a , b x = a^b^i, b 2 = a 2 + b,u, etc. The
evaluation of B is essentially the evaluation of the poly-
nomial P„(u) by nested multiplication:
P M = [{(a u + a l )u + a 2 }u + a 3 ']ii + ...,
mentioned in Ex. 1.4. It is convenient to tabulate the
calculation in the form given below, where b„ = B = P„(u).
We include also the calculation required for P' n {u) = D
given by c„_, in the table
a o, a 2 a 3
b Q u b t u b 2 u
"o bi b 2 b 3
c u CjU c 2 u
Cl c 2 c 3
ft.-, b„ (2.21)
«b
c»-l
The sequence is to calculate in succession b , b u, b u b,u, b 2 ,
etc. When the student has gained some practice it will be
possible to form b, = a r +b r . t u in one machine operation
without recording Z> r _,M, and similarly for c r , so that the
second and fourth rows can be omitted in the above table.
If several roots are required the roots should be deter-
mined in the order of increasing absolute value, and it is
usually satisfactory (and desirable) to remove them from
the polynomial as they are determined. (See Ex. 3.3 where
this procedure is discussed in connection with automatic
computing.) The polynomial obtained by dividing P n (.\)
ITERATIVE METHODS
41
§2.6
by x—u is
P n _ 1 (x) = 6 x"- , + b 1 x'- 2 + . .. + &„_„
so that the coefficients of P n _,(x) have been determined in
the course of the Newton- Raphson procedure for finding
the roots of P n (x).
Ex. 2.5. Find the real root of the following equation to four
decimal places.
.v 5 + 2-653.v 4 +4-512.v 3 -2043.v 2 -0-263.v-0-251 = 0.
A graph shows that the real root is near 0-5 and we take
this as ,v . The procedure leading to the table (2.21),
without the second and fourth rows, gives, on using five
decimal places throughout for convenience:
1 2-653 00 4-51200 -2 043 00 -0-263 00 -0-25100 (0-5
1 315300 608850 100125 0-237 62 -0132 19
I 3-653 00 7-915 00 4-958 75 2-717 00
1 2-653 00 4-512 00 -2043 00 -0-263 00 -0-25100 (0 549
1 3-202 00 6-269 90 1-399 18 0-505 15 026 33
1 3-75100 8-329 20 5-97191 3-783 73
The first correction is <5, = 0132 19/2-71700 = 0049, giving
.v, = 0-549. The second correction is S 2 = -0026 33/
3-783 73= -0-006 96 giving .v 2 = 0-5420 4 . A further
iteration gives a final value of 0-5419, correct to four
decimal places.
§ 2.6. Errors when finding roots of polynomials. When
we wish to specify the accuracy of an approximation A" to a
root x of an equation f(x) = there are at least three
different ways in which the word " accuracy " can be
interpreted:
(i) Assuming that/(.v) is specified exactly then | X-x \
is the absolute accuracy of X and | X-x |/| x |
42
NUMERICAL METHODS
§2.6
§2.6
ITERATIVE METHODS
43
is the relative accuracy of X. We shall normally
use the word " accuracy " in one of these senses.
(ii) If we do not know x then the smallness of the
quantity f(X) = R, say, gives some indication of
the accuracy of X. The quantity R is called the
residual. It is most important to realise that small
R docs not necessarily mean that the absolute or
relative error of X is small. This is discussed
below,
(iii) If the constants defining the equation f(x) = (for
example the coefficients, if/ (a) is a polynomial)
are not known exactly, then the uncertainty in
these constants means that there is an uncertainty
in the values of the roots of /(.v) = 0. The
accuracy of an approximate root can be no greater
than the uncertainty produced in the roots by
possible variations in the constants defining the
equation.
In some cases, with reference to (ii), small values of R
can be produced by values of X appreciably different from
the exact roots; and with reference to (iii) small variations
in constants can produce large variations in the roots.
These phenomena often appear together. In either case
the equations are said to be ill-conditioned. As an example
consider the equation
/ (x) = x 2 - 1 064.V + 0-283 - 0, (2.22)
which has roots 0-527, 0-537 to three decimal places. The
equation
x 2 -10641.v+0-283 =
has roots 0-523, 0-541; changing a coefficient by I in
10,000 has changed the roots by 1 in 130. If we substitute
x = 0-532 in (2.22) then/ (.y) = R = -0000 024; changing
x by 1 in 100 produces a residual which is one ten-thousandth
of the smallest coefficient. It should be emphasised that
i
I
these phenomena are inherent in the equation; they appear
whatever method is used to solve the equation.
Since in the above example a change of 0001 in a root
produces a change of only 0000 005 in R, we need to work
to six decimal places in order to verify that a root is accurate
to three decimals. To examine the reason for this more
closely, we note that in the Newton-Raphson method the
correction to the root is -f(x)/f'(x). In the above example,
although /(.v) = R is small, /'(*) is also small. For ex-
ample, if we choose x = 0-525,
f(x) m (0-525) 2 - I -064(0-525) +0-283 = 0000 025,
/'(a) = 2(0-525)- 1 064 = 0014.
We encounter the familiar difficulty that in computing f(x)
and /'(v), the cancellation of large numbers gives small
numbers, and in order to obtain a moderate degree of
accuracy for their quotient we need to work to a large
number of decimal places. This is not a criticism of
Newton's method. The difficulty will occur in one form or
another, whatever method is used to solve the equation.
To illustrate by a numerical example, consider the
formula obtained by applying the Newton-Raphson formula
(2.10)to(2.22):
x 2 -0-283
2x,-l-064
= 0-55, working to four decimal places,
Iteration with .v
gives
00195
v, =
0360
= 0-5417, x 2 =
00104
00194
= 0-5361, (2.23)
and then in succession 0-5366, 0-5326, 0-5833, 0-5575,
0-5451, 0-5382, 0-5403, 0-5361, .... This last number,
which is .v, , is in fact identical with x 2 and further iteration
will merely repeat the values for x 3 to .y 10 . Even though
44
NUMERICAL METHODS I
§2.6
we have used four decimal places, all we can say about the
root is that it is near 0-54. Equation (2.23) shows clearly
the way in which significant figures arc lost in intermediate
calculations. Similarly if five decimals arc used in the
calculation it will be found that the iterates vary between
0-536 50 and 0-537 50 in general, and we can say that the
root is approximately 0-537. There is no guarantee that the
correct answer lies between the upper and lower limits
within which the iterates fluctuate, or even that if an
iteration apparently converges that it converges to the
correct answer. Thus in the above example if we iterate
with .v = 0-50 using four decimal places, we find in
succession 0-5156, 0-5244, 0-5263 and all further iterations
repeat 0-5263, but we know that the correct answer in this
case is 0-5270.
To sum up this discussion: in order to calculate any
root to a specified degree of accuracy an essential and
unavoidable degree of accuracy is necessary in the inter-
mediate calculations and this may be much greater than the
final accuracy of the root.
The change Sx in a root x produced by a change 8a r in
a coefficient a r can be examined theoretically as follows.
The quantity x+5x is a root of the perturbed equation, so
that
P n (x+5x)+8a r (x+8xy- r = 0,
Hence to first order, since P„(x) = 0,
or
P&x)8x+5 ar x n - r = 0,
8x = -8a r {x"-'IPXx)}.
(2.24)
When {x"~ r IPI,(x)} is large for any root, then by our previous
definition this root is ill-conditioned. Of course the linear
approximation on the basis of which (2.24) is derived
breaks down for ill-conditioned roots. Equation (2.24) is
not quantitatively accurate for the polynomial (2.22), for
§2.6
ITERATIVE METHODS
45
example, but the qualitative indication of ill-conditioning
is correct. Formula (2.24) indicates that the smaller roots
tend to be better conditioned than the larger roots since,
if X; is the /th root, the power x"~ r usually increases more
rapidly with i than PJ(x,).
Some instructive illustrations of the application of (2.24)
arc given by J. H. Wilkinsonf from whose paper the follow-
ing examples are quoted. In this paragraph we assume that
x lt x 2 , ... x„ represent the n roots of a polynomial of"
degree n. (Elsewhere in this chapter x, denotes the rth
approximation to a root in an iterative procedure.) If we
consider the polynomial equation
^o(-v) = (-v+ 1X-V+2). . .(.v+ 20) = 0, (2.25)
with roots .v, = -s, (s = 1 to 20), we find for a change
«5a, in a, :
8x 20 a 0-4 x 10 8 &7,, (5.x, 5 w 0-2 x 10 ,0 &i„ 8x 5 « 0-6<5a,.
The roots .*, to x s are well-conditioned, but the conditioning
is extremely poor for the large roots. If we consider the
polynomial equation
(.•c + l)(x+2)...(x+20)+2-"x 19 = 0,
which differs from (2-25) by about 10" 7 in a u the roots are
x t = — s to nine decimal places for s = 1 to 4, but .v 10 , x tl
are — 10-10±0-64/, .v 15 ,.v 16 are —13-99 + 2-52/, and. v I9 ,a.- 2 o
are — 19-50+1 -94/. Among other things, this example
illustrates in a remarkable way that some roots can be
ill-conditioned and others well-conditioned, in the same
equation.
Although it is true that ill-conditioning is usually
t The evaluation of the zeros of ill-conditioned polynomials,
Numcrische Mathematik, 1 (1959), 150-180. This excellent paper
will repay careful study.
46
NUMERICAL METHODS
8 2.7
associated with the occurrence of nearly equal roots, which
in turn implies that f'(x) is small, this remark should be
interpreted with care. Thus the equation
.x 20 + l =0,
which has 20 roots, all of modulus unity, is comparatively
well-conditioned. This can be confirmed by using (2.24).
§ 2.7. Bairstow's method for finding complex roots of
polynomials. In this section we extend the Newton-Raphson
method for real roots (§ 2.5) to give formulae convenient
for dealing with complex roots. Suppose that w is an
approximate complex root of P n (x) = 0, where the poly-
nomial has real coefficients. Consider the quadratic
factor (x— w)(x— w) = x 2 — px—q, say, where w is the
complex conjugate of w, so that p, q are real. We write
(cf. (2.19), (2.20))
P„(x) = (x 2 -px-q)S n . 2 (x) + Ax+B, (2.26)
S„_ 2 (.x) = (x 2 - P x-q)T n _ A (x) + Cx + D. (2.27)
Hence P„(w) = Aw+B. Also
P#e) = _2x- P )S,,. 2 (x) + (x 2 -px-q)S' n . 2 (x) + A.
Since p = w+w this gives Pfov) = (w-w)(Cw+D) + A.
From the Newton-Raphson formula the correction Sw to
the root w is
Sw = —
Aw + B
(w-w)(Cw+D)+A
In general Sw is small, so that the numerator of this expres-
sion is much less than the denominator and this implies
that the term A in the denominator can be neglected.
I
5 2.7
ITERATIVE METHODS
47
Hence we can write
Sw = —
Aw+B
(w-w)(Cw+D)
(2.28)
When we examine the division of P n (x) by a quadratic
factor x 2 —px—q in detail, it is found convenient to modify
the above formulae in the following way. Suppose that in
(2.26)
P n (x) = a x"+a l x''- , + ...+a„,
S„_ 2 (.v) = b x"- 2 + b l x"- 3 + ... + b n . 2 .
Then from (2.26), b = a a and
b i = "j+P&o.
b r = «,+/>!>,_, + gb r _ 2 , (2£r£«-a (2-29)
A = a.-j+l»^,-a+«&»-3. B m a n +qb„. 2 .
If we extend (2.29) to hold for r — n- 1 and n, thus defining
&„_, and b„, we see that
A = b„. i , B = b„-pb„. 1 .
Similarly suppose that when we divide x 2 —px—q into
S„~ 2 (x) to give (2.27) the coefficients of 7;_ 4 (a) are c , c u
... c„_ 4 , and we define c n _ 3 , c n _ 2 by the same rule as for c r
(r = 2 tow -4). Then we find
C=c n _ 3 , D = c n . 2 -pc„- 3 .
On using these results in (2.28), remembering that w+w = p,
we obtain
Sw = —
K-^-b„
0v-n'Xc,,_ 3 >v-c._ 2 )
(2.30)
NUMERICAL METHODS
§27
§2.7
ITERATIVE METHODS
49
The computational scheme for the b r and c r is as follows
(cf. (2.21)):
a a, a 2 a 3
pb pb y pb 2
qb qb x
K *l *2 b i
pc pc t pc 2
c c, c 2 c 3
a«-2 « n -i a n
pb„-3 pb„- 2 pb„-i
Rb n . i <//>„. x qb, ,
l'„-2 *»-i
b„ (2.31)
Instead of working in terms of the correction Sw to the
root it is convenient to work in terms of dp, Sq, corrections
to the quadratic factor. We have
(.v-H'-<5iv)(.x-M'-<$iv) = x i — (p + Sp)x-(q + Sq),
so that
Sp — Sw+8w,
Sq b — n'<5iv — u»5»v — ^w^iv.
On neglecting the second-order term in the last equation
we find, on substituting from (2.30), that
where
<5p = (fc„c„_3-/) n . lCn _ 2 )/A, (2.32a)
<5<7= -(b n c„- 2 -b„_ t E)IA, (2.32b)
A - C 2 _ 2 - Cfl _ 3 £, (2.32c)
E = pc n - 2 + qc„. 3 . (2.32d)
These are Bairstow's formulae. They are equivalent to the
following simultaneous linear equations for Sp, Sq:
c„. 2 5p + c„- 3 Sq = -*>„_„ (2.33a)
ESp+c„. 2 5q = -b n . (2.33b)
(
The derivation given here has shown the very close relation-
ship between Bairstow's formulae and the Newton-Raphson
method.
Bairstow's method depends on locating a quadratic
factor approximately before iterating to obtain the accurate
factor. It should be noted that methods such as root-
squaring (Graeffe's process) and the Aitken-Bernoulli
process are available for calculating complex roots of
polynomials even though no initial approximations are
knownf.
Ex. 2.6. Find the complex roots of smallest modulus for the
polynomial equation in Ex. 2.5, to four decimal places.
As an approximation to the required quadratic factor
we take the last three terms of the polynomial which give
(.v 2 + 01 287a- +01 229). The procedure in the table (2.31),
omitting the second, third, fifth, and sixth rows, which do
not need to be recorded, gives
100000 2-65300 4-51200 -2043 00 -0-26300 -025100
1000 00 2-524 30 4064 22 -2-876 30 -0-392 31 152 99
100000 2-395 60 3-63301 -3-63829
From (2.32) we find
£ = 002175, A = 13158 14,
Sp = -0 066 23, Sq = 041 65.
On computing the new/), q and repeating the procedure we
find
P a K-\ K
—0194 93 -008125 0003 26 -0006 83
-0-195 74 -0-083 12 -0000 01 +0000 03
Further iteration indicates that /;, q arc accurate to one unit
t Sec, for instance, Modern Computing Methods, H.M. Stationery
Office, 2nd Edn. (1961).
so
NUMERICAL METHODS I
J 2.7
in the fifth decimal place, and solution of the corresponding
quadratic gives the complex roots 0-0979 + 0-27 12/, to four
decimal places.
Examples II
Ex. 2.7. Find the real roots of the following equations
to four significant figures
(i) x = cos x,
(ii) .v 2 -l = sin a,
(iii) X» = 5.
Ex. 2.8. The equation x* + 1-5.\- 1-5 = can be written
in the forms
(i)x = l--fx 3 ,
(ii) x = l-5(l-5 + x 2 )-\
(iii) x = (l-5-l-5x)*.
On applying the method of § 2.2 find which of the corre-
sponding iterative procedures are convergent. Hence
determine the real root of the equation to four decimal
places.
Ex. 2.9. If the root of the equation cot .v = kx between
and \k is small show that it is given approximately by
x«(Jfc+|)~*, where k is large. If the root is near {it show
that it is given approximately by x«in(l +k)~ l where k is
small. Use these results as first approximations to calculate
the roots of the following equations to the appropriate
number of significant figures, assuming that the coefficient
of -v in each case is correct to the last digit given:
(i) cot x = 111-lx,
(ii) cot a: = 1-lllx,
(iii) cotx = 0011 llx.
§2-7
ITERATIVE METHODS
S1
Ex. 2.10. Use the Newton-Raphson method to obtain
the following iterative formulae for x = a 1 '", (a>0):
f{x) m x"-a, x r+1 = n- l {(n-i)x,+ax}- n },
fix) = t-(«/x"), x r+1 - »- , {( n +l)x r -<^ 1 x:: +, }.
If x = x r +5x r show that in each case 5x r+l = /» r 5.v 2 for
suitable p r . Deduce that if n> 1 or«< — 1 the two sequences
approach x from opposite sides. Illustrate these results
by calculating n*, «""■ by the two sequences, as accurately
as possible, given n = 3141 593.
Ex. 2.11. Show that x r+ , = (x r 3 + 3ax r )/(3x 2 + a) is a
third-order procedure for the calculation of yja. Illustrate
this result by calculating ^11 to nine decimal places, using
x = 3. (Compare Ex. 2.2.)
Ex. 2.12. Let ;•, = /,(*), y 2 = f 2 (x). Show that the
chord joining (x,, ;■,) to (x 2 , y 2 ) cuts the x-axis at
This is the basis of the well-known method of false position!
for finding the roots of/(x) = 0: the value of y 3 =/(xj)
is computed and the procedure is repeated using (x 3 , y 3 )
instead of either (x,, ;•,) or (x 2) y 2 )- It is convenient to
choose the two values of x at each stage so that they lie on
opposite sides of the root. The method is essentially inverse
linear interpolation. Use this method to solve the equations
in Exs. 2.1,2.3.
Ex. 2.13. A method due to J. H. Wcgstein:!: which
improves the convergence of the iterative procedure of
t E. T. Whittaker and G. Robinson, Calculus of Observations
Blackie (1944), p. 92.
X Comm. Assoc. Compui. Mach. 1 (1958) No. 6, p. 9. See also
G. N. Lance, Numerical Methods for High Speed Computers, IlilTe
(1960), pp. 134-138.
52 NUMERICAL METHODS I
§ 2.2 for solving the equation x = F(x) is given by
X n+l^n-l~^n X n
§2.7
§2.7
ITERATIVE METHODS
53
-^n+l —
(n = 1, 2, ...),
x n+l~ ■*n + ^'n-l x n
where x„ +1 = F(X n ), and X , X t are arbitrary. Show that
this procedure is simply a systematic version of the method
of false position described in Ex. 2.12.
Ex. 2.14. Consider the following sequential calculation.
I f we define
a =\, fc = (l + .v 2 )*,
", + . = i(fl r +b r ), b, +l = frfed*, r = 0, 1, 2, ....
then it can be shown that, as r tends to infinity,
lim a r = lim b, = x/arc tan x.
This is not an iterative method of the type considered in
this chapter since the limit depends on the starting values.
In particular it is vital to avoid mistakes in the above
calculation.
Ex. 2.15. Let ±x u ±\,, ±x 2 , ±x 2 denote the roots of
x 8 -2(2p 2 -0-3)x 6 +(ie*+6p 4 -6p 2 +3-73).x 4
-2p 2 (2p 4 -3-7p 2 + l-7)x 2 +pV-l) 2 = 0.
Calculations on an automatic computer gave the following
results (Q = 100):
forp = 4, x, = 50- 159 + 49-842;, .v 2 = 0-155 39H 0-15445/,
forp = 10, xi = 51 017 + 49022/, .v 2 = 1-0139 + 0-9745/.
Veiify that for the smaller roots the formula
e{i+ 6 ' 2(p2 -
> 2 -p 1
*2
-<^'-^- 2j mr}
gives results correct to 1 in 1 5,000 for p = 4 and 1 in 5000
for p = 10. Derive a formula accurate to 1 in 10,000 for
the larger roots. (The moral of this example is that even
though numerical results can be obtained by an automatic
computer this should not blind one to the possibility of
obtaining an adequate approximate answer in analytical
form.)
CHAPTER III
ELEMENTARY PROGRAMMING FOR AUTOMATIC
COMPUTERS
§ 3.1. Introduction. For the purposes of this book it is
unnecessary to go into detail regarding the structure of
electronic digital computers; it will suffice to make some
general remarks about the way in which an automatic
computer works. An automatic computer has to be told
precisely, and in detail, the steps which have to be performed
to solve any given problem. From this point of view a
calculation consists of two parts: a series of instructions
specifying the method to be used, and a sequence of numbers
leading from the data of the problem through various steps
in the computation to the answer. It will be assumed that
instructions and data are punched on cards which have to
be fed into the machine, and that the results of a calculation
are printed out by the machine on an output sheet of paper.
For our purposes a digital computer can be considered to
have the following components:
(a) A memory or store in which numbers and instructions
can be stored and from which any required
number can be produced at will. Each number
in the memory is allocated a memory position or
address which can be regarded as a pigeonhole in
which the number is stored.
(b) An input mechanism for transferring the information
on the input cards into the memory.
54
§3.1
ELEMENTARY PROGRAMMING
55
(c) An output mechanism for transferring information
from the memory onto the output sheet.
(d) An arithmetic unit for carrying out simple basic
arithmetical operations such as addition, subtrac-
tion, multiplication, division.
(e) A control unit which organises the calculations, i.e.
arranges for the input and output of information
and the execution of arithmetic operations in
the correct sequence as specified by the instruc-
tions.
The set of instructions for solving a problem is called
the program. When solving a problem the computer
begins by storing the complete program in its memory
via the input mechanism. It then proceeds to obey the
instructions in an order determined by the program.
The basic instructions which a machine obeys are
usually very simple. For instance it will need at least one,
and perhaps several, basic orders to execute the operation
" Add the two numbers contained in positions a and b in
the memory and place the result in memory position c ".
The reader will appreciate that to program a problem in
terms of " basic machine language " would require con-
siderable background knowledge of individual machines.
It would also be very laborious. Fortunately there is an
increasing tendency to produce systems by means of which
a programmer can write his program in a relatively simple
language resembling ordinary algebra for the formulae and
ordinary terminology for the instructions. This simple
language must of course be interpreted in terms of, or
translated into, basic machine language before the computer
can solve any given problem, but this interpretation or
translation can be done by the machine itself using a
" compiler ". The programmer need only prepare his
program in the simple language. We shall describe a simple
system of this type, devised for teaching purposes.
56
NUMERICAL METHODS
9 3.2
§ 3.2. Simple programs. Consider the evaluation of
! =
1
J{R 2 + [2nfL-U(2nfC)Yy
(3.1)
for various sets of numbers R, L, C, f. It will be assumed
that each set of R, L, C, f is punched on a separate input
card. A program for this calculation can be written in the
form of five statements as given in Program 3.1. (Instead of
" statement " the words " instruction " or " order " can be
used, if preferred.)
•v
PROGRAM 3.1
Label Statement
9 Read/?,L, C,f
II = 6- 2832 fL- 1/(6- 2832 fC)
Print/, R,L, C,f
Go to 9
The first statement makes the computer read the first
card containing input data. Four numbers are on this
card in the form of four sets of punched holes. The
machine associates four memory positions with the symbols
R, L, C,f and stores the four numbers from the input card
in the corresponding memory positions. From now on,
instead of saying " The number in the memory position
associated with the symbol R " we shall say " The number
associated with R " or, shorter still, " The number R ".
The second statement says " Perform the calculation on
the right-hand side of the equation, using the numbers just
associated with L, C, f. Store the result in a new memory
position associated with the symbol H ". The third state-
ment has a similar meaning: it computes and stores /.
These two statements could of course be written as one
§3.2
ELEMENTARY PROGRAMMING
57
single statement, but it is slightly clearer to write them
separately.
The fourth statement makes the machine print the
computed value of i and the values of R, L, C/from which
/ has been derived.
The last statement transfers control to the statement
numbered 9; in other words it instructs the computer " Go
to the statement 9 and do what it says: then perform the
statement following 9. and so on". In Program 3.1 this
means that the computer reads the next input card, which
contains four new values of R, L, C, f. The computer
replaces the old values in its memory by the new values and
repeats the sequence of operations. The old values are lost
completely when they are replaced by new values.
Instructions concerning numerical calculations, like the
second and third statements in Program 3.1, are called
arithmetic statements. The other instructions, which
organise the numerical calculations, are called control
statements.
The arithmetic statement
a = (A + 2-431)/j-
(3.2)
says "Compute the number a by adding 2-431 to x and
dividing by y ". Hence x and y must have occurred
previously in any program in which this statement occurs,
since the machine must go to memory positions already
allocated to the symbols x and y in order to find the
numbers represented by these symbols. The precise
meaning of the statement depends on whether the symbol a
has occurred previously in the program. If a has occurred
previously there will be a memory position already associated
with a. The meaning of (3.2) is then " Obliterate the
number in the memory position associated with a and
replace it by the number computed from the right-hand side
of the equation". If a has not occurred previously the
meaning is " Allocate a memory position to the new
58
NUMERICAL METHODS I
8 3.2
symbol a and place in this memory position the number
computed from the right-hand side of the equation ". It
is clear that, although arithmetic statements resemble
ordinary algebraic formulae in appearance, the " = " sign
is used in a different sense. No confusion can arise from
this because programs contain only statements (i.e. instruc-
tions), never algebraic formulae.
The rules governing arithmetic statements can be
summarised as follows:
(i) It is essential to have only one symbol on the
left-hand side of a statement. (For instance (3.2)
cannot be written ay = x +2-431.) This symbol
may have suffixes, e.g./,,/ r , a,j. If the suffix r
is a variable parameter, for example if /• = 1, 2,
... «, the value of r must have been specified
earlier in the program.
(ii) Any symbol on the right-hand side of an arithmetic
statement must have occurred previously in the
program. (Incidentally this is why we have
written 6-2832 in Program 3.1 instead of 2n since
the symbol n has not been defined in the program.
We could of course have written n = 3- 14 16 as
the second statement and then the third statement
could have been H = 2nfL- ]/(2nfC).)
(iii) Apart from (i) and (ii) any arrangement of symbols
or notation which is self-explanatory according to
the ordinary rules for writing and interpreting
algebraic formulae will be allowed.
We next consider an extension of Program 3.1. Suppose
that for any given set of R, L, C we wish the machine to
compute first the value of/ for which i is a maximum,
namely, from (3.1),
/' =
1
2rr(XC)* -
§3.2
ELEMENTARY PROGRAMMING
59
and then evaluate i for
f=rh, r= 1(1)«, h = F/n,
where n is a given integer. The notation r = l(l)n, which
we have already met in Ex. 1.4. means that r starts at unity
and increases by steps of unity until n is reached. A
suitable set of instructions is given in Program 3.2 which
contains only one new type of statement in addition to
those in Program 3.1.
Programs 3.2(a) and 3.2(b) compute the same quantities.
We first of all consider 3.2(a). The interpretation of the
first three statements has been covered by the discussion of
Program 3.1. The fourth statement
Do up to 3 for r = 1(1)/?
instructs the machine " Perform all statements which
immediately follow, up to and including the statement
numbered 3, n times, the first time for r = 1, the second
time for /• = 2, and so on, and the last time for r = n.
Then go on to the statement following statement number 3."
The first time round, the machine therefore computes and
stores/,, //,, /,, the second time/ 2 . H 2 , i 2 , and so on until
finally /„, //„, /„ are computed and stored. When this is
done the computer will print the/j and i t (.? = 1 to n) and
/?, L, C. The part r = 1(1)h of the " Do up to " statement
obeys rules which are similar to those governing an arith-
metic statement, namely that the symbol on the left may or
may not have occurred previously in the program, but any
symbols on the right must have occurred previously in the
program. Compared with Program 3.1 the " Print " order
in Programs 3.2(a), (b) has been extended to print a sub-
scripted variable with the range of suffixes clearly indicated.
Commas are used to separate independent variables but the
word " and " is used to separate subscripted variables with
l he same range.
60
NUMERICAL METHODS I
$3.2
§3.2
ELEMENTARY PROGRAMMING
61
The numbers " 9 " and " 3 " in the " Label " column
of Program 3.2(a) are simply identifying tags attached to
certain statements, so that these statements can be referred
PROGRAM 3.2
(a)
Ladfi. &TATEMBN i
9 Read R, L, C, n
F=0-I5915 (LC)-*
// = F/n
Do up to 3 for r = 1(1) n
/, - rh
//, = 6-2832/,!- 1/(6- 2832 f r C)
3 /, = (/? 2 +//,*)-*
Print R, L, C, //,/, and /, for s = 1(1) n
Go to 9
m
Labh. Statement
9 Read R, !, C, n
F = 0- 15915 (!C)-*
li - />
/=0
Do up to 3 forr = 1(1) n
/ = /+//
// - 6-2832/!- 1/(6- 2832 fC)
3 J r = (/? 2 + // 2 )-*
Print fl, !, C, ff, F, /;, /, for s = 1(1) n
Go to 9
to in control statements such " Go to " and " Do up to "
in tliis example. For labels we use positive integers. These
need not be in increasing order of magnitude. As labels,
instead of numbers we could equally well have used Greek
or Roman letters.
In Program 3.2(a) the machine stores H r (r = 1 to n) al-
though these are used only for immediate calculations, and
are not required after the corresponding i r have been found.
Storage space can be saved by arranging the computation
as in Program 3.2(b) where for illustrative purposes we
have decided that storage and printing of all the/, (r — Hon)
is also unnecessary. Consider the statement
This means " Take the number in the memory position
associated with / and add /; to it. Place the result in the
memory position associated with /". The old number
which was in/ is overwritten. Any subsequent reference to
/ applies to the new number which has just been stored in
the memory position associated with /. Program 3.2(b)
should now be self-explanatory.
Suppose finally that in addition to the calculation just
considered we wish to evaluate /, for
f r = rh, r = «+1,m + 2, ... W,
where Wis defined by the inequalities
l N <plR£t„. u (p<l, given). (3.3)
The maximum value of /, is \jR, and i r is monotonically
decreasing for r>n, so that we wish to compute i r for
increasing r until the value of/, falls below some prescribed
fraction of the maximum value of /. This can be done as
in Program 3.3 where the first eight statements are the
same as in Program 3.2(b) apart from the addition of a
parameter to the " Read " statement.
The point of this example is that the number N in (3.3)
is not known beforehand. The machine has to determine
N by computing /, for s = n+ 1, n+2, ..., each time com-
paring the current value of i a with p/R until i,<pjR. This
62
NUMERICAL METHODS I
§3.2
§3.2
ELEMENTARY PROGRAMMING
63
procedure is controlled by an " If" statement:
in^p/Rgoto 14
which instructs the machine: " If i s ^p/R transfer control
to statement 14 as in a 'Go to' statement. If i a <p/R
perform the statement immediately following the ' If '
statement, namely ' Print . . . '." In the second half of
Program 3.3 we have used s, g instead of r, f. This is not
essential but it emphasises that the program consists of two
distinct parts. The value of s which is printed is the value
assigned to the variable s when the " Print " statement is
reached, and this is the same as the number N in (3.3).
The possibility of programming Yes-No decisions by
means of " If" statements is one of the distinctive features
of performing calculations by means of an automatic com-
puter. The " If" statement allows the machine to control
its own calculations.
From our point of view the main difficulty in program-
ming does not lie in the arithmetical calculations themselves
but in the organisation of the sequence of calculations.
It is often helpful to draw a flow-diagramf to clarify the
logical structure of the program to be used. This is
illustrated for Program 3.3 in Fig. 3.1 which should be self-
explanatory. A flow-diagram consists of boxes connected
by lines. The boxes are essentially of two types. A test
box or If-box has one input line and two output lines. The
output line which is selected depends on whether the answer
to the question in the box is " yes " or " no ". Any other
kind of control or arithmetic statement is placed in an
assertion box. It is convenient to make test boxes oval
and assertion boxes rectangular, as in Fig. 3.1. In contrast
to the relatively limited kinds of statement used in programs,
any kind of statement is allowed in flow-diagrams as long
t An excellent collection of flow-diagrams (or flow-charts) is
contained in Mathematical Methods for Digital Computers, A. Ralston
and H. S. Wilf, Wiley (1960).
as the meaning is clear. A flow-diagram tends to be
descriptive but the logical interconnections are described
precisely. Flow-diagrams can vary from description in
broad outline to description in detail. At the one extreme
the flow-diagram will consist of almost equal numbers of
PROGRAM 3.3
Label Statement
9 Read R, L, C, n, p
F = 0- 15915 (LQ-*
1 1 = Fin
/=0
Do up to 3 for r = 1(1) n
f-f+k
H = 6- 2832 fL- 1/(6- 2832 fC)
3 i r = l/(/? 2 + // 2 )*
x = n
9 m "''
14 .V = 5+l
9 = 9 +h
II = 6-2832 gL -1/(6' 2832 gC)
If i^p/R go to 14
Print s, i, for t = 1(1) s, R, L, C, n, p
Go to 9
assertion and test boxes, with little detail in the assertion
boxes. At the other extreme the programmer can convert
the flow-diagram into a program directly by merely removing
the lines and boxes, and inserting labels and " Go to "
statements as necessary. In the flow-diagram in Fig. 3.1
the control instructions are given in detail but no arithmetic
information is given.
64
NUMERICAL METHODS I
§3.2
§3.3
ELEMENTARY PROGRAMMING
65
Start
/isa
\car
s another inputN Nn.
card available'.',-'
Stop
machine
Yes
V
Read in the next
values of R, A, C, «. />
Compute F. h
Compute /,, r = l(l)n
T
Scl i = n
1
Increase s by 1
Compute /,
?\ v«
Mill a, /,(; - I lo v). It. A. C, n. I'
1
Fig. 3.1 Flow-diagram for Program 3.3
§ 3.3. Some programs involving iterative procedures. The
problems involved in programming an iterative procedure
for an automatic computer can be illustrated by considering
the solution of equations by means of the Ncwton-Raphson
procedure. From §2.3, equation (2.10), the equation
f(x) = is solved by means of the iteration
.v r+1 = x r -f(x r )!f(.x r ),
where x r is the rth approximation to the root x.
It is necessary to start by obtaining initial approxima-
tions .v for each of the roots which we wish to determine.
It may be possible to locate the roots roughly by hand
calculation and include the resulting estimates in the input
data for the machine, but usually it is better to program
some procedure by means of which the machine itself can
make initial estimates of the roots.
The next step is to program the machine to perform the
iteration. There are then two possibilities. The procedure
may converge satisfactorily in which case the machine can
be made to iterate until the difference between successive
iterates is sufficiently small. Alternatively, the procedure
may not converge or may converge extremely slowly, in
which case it is necessary to alter the iteration in some way
or, if desired, stop the iteration altogether.
A useful criterion for convergence is the following.!
We define e, = x r+l — x r . When an iterative procedure is
converging towards a root we shall have | e r+ , | < | c, |
until the e r become so small that they arc affected by the
fact that only a finite number of decimal places are used in
the calculation. In simple computations we may find that
x r = x r+ i = A- p+2 sothat£ r = c f+1 = 0. If the calculations
are appreciably affected by round-off error, however,
random error fluctuations will eventually give | e r+ , | > | e, j
for some r. (Compare the discussion of random error
t This was pointed out to me by Dr D. J. Wheeler.
66
NUMERICAL METHODS I
§3.3
fluctuations in § 2.6.) In either case we can use the con-
dition | e,+ i | ^ | e, | as a criterion that the calculation has
converged to the degree of accuracy allowed by round-off
error. The level of random variations can be regarded as
a noise-level limiting the accuracy of the calculation.
An even simpler criterion can be used if we know that
the .v, vary monotonically with increasing r. Suppose for
example that theoretically .v r+ , > x, for all r. Since we are
dealing with numbers specified to a finite number of
decimal places we shall eventually have x r+l £x r due to
either repetition of the iterated value, or round-off error
and the noise-level of the calculation. This condition
can therefore be taken as a criterion of convergence.
Some of these points are illustrated in the following
examples.
Ex. 3.1. Program 3.4 gives a straightforward routine for
finding the square root of a positive real number a by
means of the following formula (Ex. 2.2):
*+| = M*r + («/*r)}-
If the machine is given a negative value of a by mistake
this formula will give nonsensical results. In this case we
make the machine stop. The value of a has been printed
out, and this will indicate the source of the trouble. (This
is the first time we have used the self-explanatory " Stop "
order. We might have used instead an " Alarm " order
to stop the machine and warn the operator that something
has gone wrong with the calculation.) As an initial estimate
of the root we take unity and a single iteration gives
•Vo = iO + fl )- We have
Xr-yja = lfe-i-V4fe-t2ft rm^UX ■••. (3-4)
where for r = we must define x_j = 1. Hence x,^Ja
and x?7>a. (This is the reason for choosing .v = 1(1 +a).
If we choose .\ = 1 it would no longer be true that xf^a
§3.3
ELEMENTARY PROGRAMMING
67
for all r^O.) Equation (3.4) proves that the procedure is
second-order. From (3.4)
(x-Ja) - |§A— jt|fc. r ^
Hence the procedure is convergent if
1-
x, _
r-l
<1.
PROGRAM 3.4
Labfl Statement
Read a
Ifa^Ogo to 12
Print a
Stop
12 x-^l+fl)
2 i/ = x
X = K.v+(a/*))
If .v<« go to 2
Print x, a
Since .v,_ j ^ v 'n for all r the procedure is always convergent.
We have also
so that successive iterates decrease until the procedure
converges or reaches the noise-level of the calculation.
This means that we can use the condition .r r+l S;.v r as a
criterion for convergence. After the " Print x, a " order
the machine goes on to the next part of the program.
One disadvantage of Program 3.4 is that if a is very
different from unity then the procedure is initially somewhat
68
NUMERICAL METHODS I
§3.3
slowly convergent. This is avoided in Program 3.5 where
by successive multiplication or division by 16 we ensure
that the iterative procedure is used only on numbers
between £ and 4. To obtain an accuracy of say 10" 10 it
is readily verified that no more than five iterations are
necessary and this fact is used in the program. The para-
meter r in the " Do " statement is used merely to count the
number of iterations and it does not occur in any other
statement. Note that the case a = has to be considered
separately.
One way in which this example is unrealistic is connected
with the efficiency of the method. The square root opera-
tion will be provided as a standard routine for any machine
and it will be used many millions of times per year. If we
can save one millisecond in the time taken by the square
root program this will save about 17 minutes of computer
time every million square roots that are computed. Such
economies can quickly build up into a substantial saving
of computer time. Hence it is worth while devoting a
considerable amount of effort to developing an efficient
square-root program. The basic machine language will be
used and advantage will be taken of any special features
which allow time to be saved. The way in which a computa-
tion is programmed will then depend to a large extent on
the way in which calculations are performed by the arith-
metic unit inside the machine. Thus if all numbers inside
the machine must be less than unity in magnitude and
0<a< 1 we cannot calculate i(l+a) by first of all forming
1+a, although i + Ja would be permissible. However,
considerations of this kind lie outside the scope of this
book.
Ex. 3.2. Draw a flow-diagram for Program 3.5.
Ex. 3.3. Find the real roots of
P n (x) = fl x n + a 1 x"- , + ... + a n = 0.
§3.3
ELEMENTARY PROGRAMMING
69
PROGRAM 3.5
Label Statement
Read a
Print a
Ifa^Ogo to 12
Stop
12 k =
If a>4 go to 3
If a<£ go to 4
6 « = HI +«)
Do up to 5 for r = 0(1)4
5 u m i(« +(«/«))
Go to 8
3 a = a/16
Jt-fc+1
If a>4 go to 3
Go to 6
4 Ifa>0goto7
.v =
Go to 9
7 a = 16a
k = k-l
Ifa<igo to 7
Go to 6
8 x = 4"u
a m 16*a
9 Print x, a
70
NUMERICAL METHODS I
13.3
§3.3
ELEMENTARY PROGRAMMING
71
In this example we shall use .r,, x 2 , ... x m to denote the
real roots of this polynomial. (Elsewhere we have used
x r to denote the rth approximation to a root .v.) Suppose
that the following procedure is used. We first of all find
the root of smallest absolute value a - , and then divide
PJ. X ) by at-.v, giving a polynomial P„-i(x) of degree «— 1.
We then find the next smallest root x 2 and divide / > „_,(.v)
by x-x 2 obtaining P„^ 2 (x) and so on. This sequence is
repeated until we obtain .v,, x 2 , ... x m and / 5 n _ m (.v) where
this final polynomial has only complex roots. It is found
that this method is very satisfactory numerically if the
well-conditioned roots are found and removed first. It is
for this reason that the roots arc found in the order of their
size, starting with the smallest, since the smaller roots tend
to be better conditioned than the larger. This has already
been mentioned in connection with (2.24). When finding
roots by factor-removal methods it might seem desirable
to try to avoid error build-up due to errors in the roots
which are found first by performing final iterations on the
original polynomial. However, this can be inconvenient
and it is not usually necessary if the well-conditioned roots
are found and removed first, since a badly conditioned root
is difficult to determine accurately whether we iterate on
the original polynomial or on a reduced polynomial.
The roots will be found by means of the Newton-Raphson
procedure. This means that it is necessary to estimate first
approximations for the roots. For the first approximation
to *,(/•> 1) the smallest root of P„. r+1 {x) we shall choose
x r _j, the value of the smallest root of P„- r+2 i. x ) which has
just been found at the previous stage of the calculation.
This is a reasonable procedure since if we find the smallest
root at the first stage of the calculation we are likely to find
the roots in increasing order of size. Also in the case
where there are two roots close together, which is a trouble-
some case, the first of the two roots will give a close initial
estimate for determination of the second.
To start the procedure we need an estimate of at, which
we shall denote by (jc,) . This will be found in the following
way. If | a n _ , | ^ | aj we set (a.-,) = -aja„. , ; otherwise
we set (a,) = 1. To understand why we choose (*,)(> in
this way we recall that (§ 2.5 (iii))
a.-
i-1+A+...+ A.
A large value of | a„. Ja„ | indicates that at least one of the
Xj is small, and it is reasonable to estimate the smallest root
by choosing (x,) = — a n /fl„_ ,. On the other hand a small
value of | a„.i/a„ | does not necessarily indicate that the x,
are large, since cancellation may occur. A simple example
is x a +0-0001x-l = where -aja n -i = 10,000 but the
roots arc of order unity. It would obviously be unreason-
able to take (.v,) = -a„/fl n _, in this case. If we choose
(*i)o = 1 when I a«-i I < I o n | then if the smallest root is
much greater than unity the Newton-Raphson procedure
should still converge to it, though initially the rate of con-
vergence may be slow; and the procedure will converge to a
small root if this exists and |a n _,/a n | is small merely
because cancellation has occurred.
The problem of how to define (x,) illustrates one of the
difficulties of automatic computation. The program must
tell the machine exactly what to do. Ideally we should like
to be certain that the value which we tell the machine to
take as (.v t ) is a reasonable approximation to the smallest
root. But we can always invent examples for which any
specific criterion fails. The percentage of failures can be
reduced by giving the machine more complicated rules for
the determination of (X|) . We have to compromise between
the difficulty of programming more complicated procedures,
and the desire to ensure that (a-,) is a good approximation
to the smallest root of the polynomial. The suggested
procedure will not always locate all the real roots. This is
partly because we arc writing a general program to solve a
72
NUMERICAL METHODS I
§3.3
wide class of problems. It is not surprising that we can
invent examples for which the program will not work. On
the whole, the more specific the problem the easier it is to
write a program such that it will be virtually certain that an
automatic computer will obtain the correct result.
Program 3.6 gives one method for carrying out the
previously described procedure for finding the real roots of
a polynomial. A flow-diagram for this program is given in
Fig. 3.2. Starting at the beginning of the program, when
statement 3 is reached the value (.v,) defined above has
been assigned to a variable u. After defining the values of
various symbols the basic iterative procedure for finding the
correction 8 is performed in statements 7 to 11, using the
method and notation described in some detail in §2.5. If
| 8/u | > 10" 4 it is assumed that the process has not yet
started to converge and the machine repeats the iteration.
(The figure 10~* is chosen arbitrarily as a figure much
larger than the accuracy of the calculation which is, say,
1 in 10" 9 , but small enough so that when | 8/u | < 10~ 4 we
can be reasonably certain that the procedure is converging.
Depending on circumstances it may be more convenient to
work in terms of absolute instead of relative values, say
| 8 | <10~ 4 indicates convergence.) If | 8/u | <10~ 4 on
the first iteration the program defines a variable £ = 8 and
then repeats the iteration. In general the program sets e
equal to the old value of <5 before starting an iteration to
find a new <5. If the iteration is converging normally we
have | 8 | < | e |. We assume that when, after a period of
normal convergence, | 5 | — | £ | the process has converged
and the root has been determined to the maximum possible
accuracy for the method used. The current values of u
and 8 are printed out, where <5 is the next correction to u
and gives some idea of the " noise-level " of the calculation.
(This idea could be extended by asking the machine to
perform say four iterations beyond the point where
| 8 | — | £ |, and record the maximum and minimum values
§3.3
ELEMENTARY PROGRAMMING
73
PROGRAM 3.6
Label Statement
Read a, for i = 0(1) n
Print a, for t = 0(1) n
u = 1
If|fl n _, | < | a„ | go to 3
" - -<hja.-i
3 b = a
Co = «u
/- 1
r =
go to 7
£ = 5
u = u+S
Do up to 10 for m m 1(1) n-j+l
K = a m +b m - x ii
c m = b m +c m . l u
6 = - b n-J-n/ C «-j
If |5/i/| <10-*go to 12
r = r+l
Ifr<20go to 6
Print u, 8, a, for i = 0(1) n-j+l
20 Stop
12 Ifr = 0goto6
If | 8 | < | c | go to 6
Print ii, 8
Ify = n go to 20
7=7 + 1
Do up to 15 for m = 1(1) n-y+1
15 4n = K
m m
Go to 5
5
6
7
10
11
r
Read and print o, (i = to «)
If- 1
" = — «' fl »-i
Set b„ m r„ - o
7=1
±
Setr =
Set e = 8
ii = ii+8
5 - -6,-y t , /£•„-/
r^r
Is [5/i*i <10" 4 ?
Increase
/■by 1
Yes
hr<20?) >-
:isr = 07 ^_^
No
./"»-
Print «, 8.
«. fori = OOVi-J+l
Stop
Replace
o„ by />„
Yes ""■ -*■ No
Fta. 3.2 Flow-diagram for Program 3.6
Increase
,byl
§3.4
ELEMENTARY PROGRAMMING
75
of «. Note that it is a better check on the accuracy of the
calculation to print out 5 than to print out the residuals
which may be small even though the roots are inaccurate,
as shown in §2.6.) Once the process has converged the
program tests to see if j = n which indicates that the
polynomial has real roots, all of which have been found.
If this is the case the calculation is stopped. Otherwise the
new polynomial />„_/*) is set in place of P„. J+l (x) and the
iteration is repeated. The variable u is already set up as the
root of /»„__,+ ,(.v) which has just been found, and this is
the appropriate first approximation to the smallest root of
§3.4. General comments. We shall illustrate briefly
some differences between automatic and hand computing
by considering the problem of finding the real roots of a
polynomial, assuming that the program of Ex. 3.3 is used
for the automatic computer and a similar method is used
for the computation by hand.
(i) In hand computing a convenient method for locating
the roots is to evaluate /(oo),/(0), /(-co) and then/(x)
for suitable intermediate x. We should probably draw a
rough graph of y = f(x). Although it is not impossible
to program this type of procedure we have preferred a
simpler method for the automatic computer program.
(ii) In hand computing, when calculating the smallest
root of each of the polynomials P n -,(x) we would usually
try to obtain an accurate first approximation for this root
rather than merely taking the root of P n _ J+l (x) found at
the previous stage of the calculation.
(iii) In hand computing we can control the number of
decimal places used in the calculation. If the final answer
is required to a large number of decimal places we would
use a comparatively small number of decimal places in the
early stages but increase the accuracy as the procedure
76
NUMERICAL METHODS I
§3.4
converges. There is no point in doing this with an auto-
matic computer.
(iv) In hand computing we would decide the number of
decimal places accuracy required in the final result and
adjust the calculation accordingly. In automatic computing
it is simpler to work to the full accuracy available in the
machine and determine the " noise-level " for each root.
(v) In hand computing we would check at the end of the
calculation that we have located all the real roots. We
have not incorporated any such check in our program.
When using a computer it might be simpler to merely
calculate all the roots, both real and complex, and then we
should be sure that all the real roots have been determined.
(vi) In hand computing a convenient check is to divide
the final polynomial with complex roots into the original
polynomial, which should give zero remainder, though
there will in fact be a small remainder from inaccuracies
due to rounding-off error. Wc can then check that the sum
and product of the known roots agree with the appropriate
coefficients of the dividend. In automatic computing it is
probably unnecessary to apply this check.
The following are some general comments.
On the whole, in programming for an automatic
computer we prefer a numerical method which involves a
simple repetitive procedure, even though this is repeated a
large number of times, to a method involving a complicated
procedure which is repeated only a few times. The reverse
is often true in hand computing.
In hand computation the total number of operations
involved is usually comparatively small and the effect of
round-off errors can usually be made negligible merely by
keeping one or two extra decimal places. In automatic
computing the calculations tend to be large in scale and the
number of arithmetical operations may be enormous. It
may be much more difficult to estimate the effects of errors
§3.4
ELEMENTARY PROGRAMMING
77
of various kinds and it may be important to use methods
which minimise certain kinds of error.
The possibility of an automatic computer making
arithmetical mistakes can be ignored when using systems
of the type described in this chapter. In this respect
computation by an automatic machine is very different from
hand computation. On the other hand errors such as loss
of significant figures caused by subtracting two large nearly
equal numbers to give a small one may be much more
serious than in hand computation since they may easily
pass undetected. It is most important to bear in mind
that we have no information about what goes on in an
automatic computer between the input figures and the
figures which are printed out. Whereas in hand computa-
tion we often recognise loss of accuracy or misleading
results due to unsuitable methods, while doing the calcula-
tion, an automatic computer will not do this for us unless
we realise beforehand that this may occur and allow for it
in the program.
Other aspects of the suitability of methods for automatic
computers will be mentioned in connection with the various
topics dealt with later in this book.
The main objects of describing the simplified program-
ming scheme in this chapter are threefold.
(a) To give the student some idea of how a program is
organised for an automatic computer. The
machine must be told in detail what to do, but it
can control its own calculations to a limited
extent.
(b) To give the student a programming system by means
of which he can write his own programs for the
standard methods used in numerical analysis.
We have already illustrated this in connection
with the Newton-Raphson procedure and further
examples will occur throughout this book.
78
NUMERICAL METHODS I
§3.4
(c) To give the student some understanding of the
differences between hand computing and machine
computing, and of the criteria which govern
whether a method is suitable for automatic
computing.
However, the reader should not be misled by the simplicity
of the programming system and programs in this book.
One of the major preoccupations of anyone using an auto-
matic computer is the efficiency of the procedure used,
since computer time is expensive. In practice full advantage
is taken of special features of a computer and its program-
ming system in order to devise the easiest and cheapest
method for doing a particular problem on a particular
machine. These factors often over-ride purely mathematical
considerations. They lie outside the scope of this bonk.
Examples III
Ex. 3.4. It is required to find a real root of an equation
/(.v) = by means of the following procedure. Two
numbers a and h are given such that/(a)<0,/(/>)>0. We
define a sequence .v„ y r by the equations
x = a, y =b
fo+i = i(v r +;v). JY + !=>-,. if /{M-v,+>V)}gO,
[x r+l = .v„ y, + , = K.v,+>v), if /{i(v r +^)}>0.
(i) Write a program to determine a root to within 10" m ,
where in is an integer specified in the input data, and it is
assumed that the number of significant figures used in the
machine will allow determination of the root to the required
degree of accuracy.
(ii) Write a program to determine the root to the
maximum number of significant digits allowed by the
accuracy of the evaluation of f(x), assuming that this is
limited by round-off error.
13.4
ELEMENTARY PROGRAMMING
79
(Assume that " Read/(.v) " means that a set of instruc-
tions is given to the machine for evaluating/ (at). Whenever
f{y) for instance, occurs on the right of an arithmetic
statement the machine will evaluate f(x) for x = y and
use the resulting number in the appropriate place in the
arithmetic statement.)
Ex. 3.5. Write a program for finding arc tan x by the
sequential procedure given in Ex. 2.14.
Ex. 3.6. Write a program (or a flow-diagram) for
computing sin x and cos x for any value of x, - co <x< go,
given that for 0<Mgn/6
fli«
sin i/ = .
cos » =
iS + bt + b^ + bt)- 1 '
where the a,(i = 1 to 4), b,(i = 1 to 4) are given constants.
For 7r/6<2i><7i/3 use the relations
cos 2u = 2 cos 2 v-\, sin 2v = cos (\n-2v).
(The formulae for sin, cos, are discussed in A. Ralston and
H. S. Wilf, Mathematical Methods for Digital Computers,
Wiley (1960), p. 23.)
Ex. 3.7. Write a program for evaluating the n smallest
positive roots of the equation cot X = kx by an iterative
procedure for k = a(b)c (cf. Ex. 2.9). Outline briefly some
of the principal differences between the way in which this
calculation could be performed on an automatic computer,
and the way in which it could be performed by desk
calculator. (In addition to the points in § 3.4, for hand
computation we should use mathematical tables for cot x
whereas in a computer functions need to be calculated.
This may influence the choice of method. In hand com-
puting we should check by differences as in § 7.3 below.)
CHAPTER IV
SIMULTANEOUS LINEAR ALGEBRAIC EQUATIONS
§4.1. Introduction. Very many problems in applied
mathematics can be reduced to the solution of a set of
simultaneous linear algebraic equations. It is therefore of
great importance to have efficient methods for the numerical
solution of such systems.
If we consider the equations
fl M .V,+fl|,.V 2 +
+ «■»*„ = h„
"„!-v,+a„2-Y 2 + ». + a nn x n = b n ,
the solution can be written, by Cramer's rule,f
Xj = A,-/A, (J = 1 to n\
A =
where
(4.1)
(4.2)
<nl
8i
a.
*o.
and A y is A with the./th column replaced by the column of
As. This solution is useful for certain kinds of problem
but it is seldom used in numerical work when the coefficients
arc pure numbers since much more efficient methods are
available for this case. As a general rule the direct evalua-
tion of determinants is laborious and inefficient and should
be avoided wherever possible in numerical work.
t See A. C. Ailken, Determinants and Matrices, Oliver and Boyd
(1962). p. 55.
80
§4.1
linear;equations
81
The above solution is of theoretical interest. It breaks
down when A, the determinant of the coefficients, is zero.
In this case we may have either a multiplicity of solutions
(when the Ay arc also zero) or inconsistent equations with
no solution (where at least one of the Ay is non-zero, so
that if we imagine a limiting procedure in which A tends to
zero, the formula (4.2) gives apparently infinite .Vy). We
need not go into detail since it will be assumed, when solving
simultaneous linear equations, that the determinant of the
coefficients is non-zero. It can be anticipated, however,
that difficulties will arise if A is nearly equal to zero (see
§ 4.4 below). If the b t in (4.1) are zero then the Ay in (4.2)
are zero, and non-zero Xj exist only if A = 0. This case is
of importance in Chapter VI.
It is convenient to distinguish between direct and iterative
methods of solution of linear equations. The amount of
computation involved in direct methods can be specified
in advance whereas an iterative computation is repeated
until the error is less than the required amount. Direci
methods are discussed in this chapter and ihc next. Iterative
methods are discussed in § 6.4.
When dealing with numerical examples we shall omit
the Xj and the addition and equality signs in (4.1) and write
merely an array of numbers:
'i.,
*.
"n2
When working to a fixed number of decimal places it is
usually essential to arrange that the largest element in each
row and column of this array is of order unity. Tin's can
be done by multiplying the original equations by constants
and introducing suitable (preferably simple) multiples of
the unknowns. We return to this point later (§ 4.3). If the
82
NUMERICAL METHODS I
§4.2
original coefficients are such that a u = fyj (i.e. if the a i}
are symmetrical about the principal diagonal a n ,a 22 , • •• O
the scaling should be arranged to preserve the symmetry.
Ex. 4.1. Scale the equations
80*,+ 5X 2 - 0-2*3 = 2-8,
5*!- 0-2*2 + 0-0125* 3 = -00875,
-0-2A', +00125^+ 000625*3 = -0025.
The scaling factors that should be used are more or less
arbitrary. If, for example, we introduce
.xr, - 40*!, x 2 = 2X 2 , x 3 = 0-5*3,
and multiply the equations by 1, 20, 80 respectively, we
obtain
Zv, + 2-5a- 2 -0-4.v 3 = 2-8,
2-5.v,- 2.v 2 +0-5x 3 = -1-75, (4.3)
-0-4a, + 0-5.v 2 + x 3 = -2.
§4.2. The method of successive elimination. The
method of successive elimination is well-known in elementary
algebra. Thus to solve (4.3) we can subtract 1 -25 times the
first equation from the second, and add 0-2 times the first
equation to the third. This gives two equations in .v 2 , .v 3 .
Similarly x 2 can be eliminated from these two equations to
give .x 3 . Then back-substitution in previous equations yields
.v 2 , .v, in succession.
The object of this section is to describe a tabular
procedure, with frequent checks, for the practical application
of this method when the coefficients are given decimal
numbers. The calculation and the checks can be arranged
in the tabular form given below. Although this is written
out only for a set of four equations in four unknowns, the
procedure in the general case should be clear. The square
brackets will be explained later in Ex. 5.4. The numbers in
round brackets are used for checks.
§4.2
LINEAR EQUATIONS
83
"ll
«12
"13
0,4
/>.
(*,)
(4.4)
'21
021
^22
"23
"24
b 2
(*2)
1 «,.
«3I
a i2
"33
"34
*>3
(s 3 )
«..
"41
a 4 2
"43
«44
''4
CO
(S.)
(Sa)
(S 3 )
(S 4 )
(S 5 )
CS = s)
''22
*23
*>24
Cl
(h * T 2 )
(4.5)
'32
l>32
b i3
''34
C3
('3 * T 3 )]
(4.6)
kl
[>42
V,
*44
c*
('4 * T 4 )]
(4.7)
^33
C34
d>
(«3 * V 3 )
(4.8)
,,
[<43
C44
<L
(»4 * L/ 4 )]
(4.9)
0-44
e*
("4 * V A )
(4.10)
Roots:
x, x 2 x 3 x 4 (S' s x S s )
The first step is to form check sums which will be used
later:
s, = a n + a l2 +a l3 +a l4 + b h (/ = 1 to 4),
Sj =a u +a 2J + a 3J + a 4J , (j = 1 to 4),
S s = b,+b 2 +b 3 + b 4 ,
s = s, +s 2 +s 3 +s 4 ,
S = S t +S 2 + S 3 + S 4 + S s .
We should find s = S exactly since no round-off errors are
involved, and this is the first check on the numerical work.
We then calculate
'21 = 02i/0n. /31 ■ 031/011. '41 = 04l/0Jl-
84
NUMERICAL METHODS I
§4-2
The 4x4 set of equations is reduced to a 3x3 set by
eliminating the first unknown:
b u = fly-/naij, (/,; = 2, 3,4), (4.11)
c, = b,-l n b„
t, = ft-Jafe ( 412 >
Note that the steps involved in applying these three formulae
are of exactly the same kind, so that the procedure is easily
reduced to a routine. This holds also for the similar elimina-
tions which follow later. The b,j, c,, 1, are calculated line
by line and at the end of each line we perform a check by
calculating
T l = b i2 + h i3 +b u +c„ (1-2,3,4), (4.13)
which should agree with the f, to within a unit or two in the
last significant figure, any difference being due to rounding
errors.
In the same way we calculate
^32 ■ b 32 /b 22 , hi ■ b A2 lb 12 .
The 3 x 3 set of equations is reduced to a 2 x 2 set by forming
cij = K ~ lafafi ('". J = 3. 4), (4. 14)
with similar equations for d,, u ; , and a check
Ui = e a + c IA +d„ (f-3,4),
which should agree approximately with u t .
The 2 x 2 set of equations is reduced to a single equation
by
"44 = c 44 — '43 c 34>
with similar equations for e 4 , u 4 . Also V 4 = d 44 +e 4 xv 4 .
At this stage we have reduced the original equations to
S«
LINEAR EQUATIONS
85
the system (4.4), (4.5), (4.8), (4.10):
ail*l+«12*2 + *13*3 + «14*4 = *I
b 22 x 2 + b 2i x 3 + b 24 x A = c 2 (4. 16)
c 33 x 3 + c 34 .y 4 = f/ 3
^44*4 = *4-
The unknowns can now be found by back-substitution.
We have
-v 4 = eJd AA ,
* 3 = (</ 3 -c 34 .v 4 )/c 33 , etc.
A check on the final answer is provided by the equation
S' s = S,x, + S 2 .v 2 + S 3 .x 3 + S 4 .v 4 .
S' s should agree with the previously computed S 5 except
possibly for a small difference due to rounding-off errors.
However, agreement between S s and S' s to a given number of
decimal places does not necessarily indicate that the x t are
accurate to this number of places (see § 4.4).
A group of in sets of equations with the same a t] but m
different right-hand sides b, k (k = 1 to m) can be solved by
including m columns b lk instead of a single column b t .
Only one check column is needed, obtained by adding all
the elements in the corresponding rows.
Ex. 4.2. A numerical example is worked on the following
page. An extra figure is carried in the intermediate cal-
culation, but the roots are rounded to four decimal places.
The positions of numbers correspond exactly with those
given earlier except for the occurrence of only a single
number in the right-hand check columns. Consider, for
example, t 3 and T 3 in (4.6). In the numerical calculation,
instead of writing down the two numbers separately, we
first of all write down f 3 using formula (4.12) which gives
r
84
NUMERICAL METHODS I
§4.2
U
-J
U
oo «*< 2 £ £!
— .^ r- 00 —
Cl fS Ci.C-2
-•-ON*
O t r~ oo o\
CI CS (S — o\
— o
O ft
<N p~ ci r- on
\OMO\mN
Cl in <N ^- SO
6666,-
m*oo<M»
>t rp rp ri <t
6666,-,
<n r- >*
r-i in op
ci ts
(SMlONOv
in r~ ci in —
6 6 6 6 <n
o ,—*
-^ !C -~ -~ ■*
r-- ff» <n n»j
O ■*. Os ci on
in — in I-- ci
NO- so r—
so <N ci O O
6 6 6 6 6
Tt rr r-
O M VO
-66
I I
oo>>o
r~ on r—
on
O O ©
N»Ov
vomo
© © o
6 6 6
0\MO
O — r-
O O ■*
oo so r-~
N O —
6 6 6
oo
s
m ci
ci ci
S— T
in
. O
o 6
I
ON On
(N in
38
6 6
2; ®
ci in
S3
6 6
r- _
so sO
in on
© on
6 M,
c.
on fs
00 i~-
8 S
o so
CO
SO
in
m
§4.3
LINEAR EQUATIONS
87
0-201 49. Wc then calculate T 3 using formula (4.13) and
this gives 0-201 50. The difference between « 3 and J 3 is
permissible. For checking the next stage of the calculation
it is necessary to use the exact row sum T 3 , not / 3 (why not?),
so we amend the recorded value of f 3 by crossing out the
last two digits, and replacing them by 50.
In this example, since the square array of coefficients a u
is symmetrical about the diagonal line containing a xu a llt
a 33' O44, each of the square arrays derived from these
coefficients is symmetrical, apart from small differences due
to rounding-off errors. This can be used as a check, or
only half the non-diagonal terms need be computed.
§ 4.3. Choice of pivots and scaling. In our description
of the method of successive elimination in the last section
the element in the top left-hand corner of each array plays a
very special part. It is called the pivot. The / /; are obtained
by dividing each of the other elements in the same column
as the pivot by the pivot. It is obvious that in any given
array any non-zero element can be used as the pivot.
Suppose that in the a tJ array wc choose a„ as the pivot.
The rth row, which contains the pivot, is set aside. We
calculate, from the elements in the sth column
As - cja rs , (/ = 1 to 4, omitting I = r).
Then
O ci >n (N
CI
CO
>— ■
Tf NfOsOsO
t*-
in in t ci
°°
t*\
6 6 6 6
w
1
r- 5 'fr
O O 1-
in ci
m
so r-
00
OM-O
■*
30
SO O t-
— rs
O
2
Os oo so
fS so
OO
OOO
O O
1
"
where / = 1 to 4, j = 1 to 4 omitting 1 = r,j = s, i.e. the
row and column containing the pivot. The routine is
exactly as before.
In the elimination procedure of §4.2, equations (4.1)
with n = 4 were reduced to the form (4.16). It is easy to
see that the determinant of coefficients has not been altered
since one determinant is obtained from the other by adding
NUMERICAL METHODS I
multiples of certain rows to other rows:
A =
§4.3
011
012
«13
014
=
011
012
013
«2I
a 22
«2 3
024
/> 22
023
a 3i
032
033
034
<*33
o 41
fl 42
04 3
014
=
011
f>2 2^33044-
"14
*24
*34
(4.17)
This is an important result, that the determinant of the
coefficients of the original equations is given by the product
of the pivots. It can be proved in the same way that this
result holds however the pivots are chosen, apart possibly
from the sign of the determinant.
It is sometimes stated without qualification that the
largest element in an array should always be chosen as the
pivot but the importance of this can be over-rated. Before
discussing this we consider an example.
Ex. 4.3. Soke the equations in Ex. 4.2 by the method of
successive elimination, using the largest element in each
array as the pivot.
If we assume that the coefficients of the equations in
Exs. 4.2, 4.3 are exact and solve the equations using a large
number of decimal places in the intermediate calculations
we find that the exact roots are
x, = -5-3868, .v, = 2-8137, .v 3 = 11-5729, .v 4 = -6-3653,
(4.18)
rounded to four decimals. The differences between this exact
solution and the results in Exs. 4.2, 4.3 are given in Table 4. 1
which shows that, in this particular example, the results
obtained by taking the largest pivot at each stage arc less ac-
curate than the results obtained by choosing the top left-hand
pivot. The reason for this is the smallness of the final pivot in
Ex. 4.3, namely 0002 51, which is considerably smaller
S4.3
LINEAR EQUATIONS
89
vo
X
<N
VO
VO
OJ
■/->
m
•e
vO
Os
-T-
rl
OO
OO
T
-c"
©
3
3
r-i
OO
1
<N
ON
00
ci
rp
<N
fN
•-*
O
tm
<N
o
6
©
6
6 6
1
6
1
T
CN
rri
c-
v©
<N
r-
'',
r»
r-
-r
v©
o t-
«N
•r,
r-
r»
N
>r>
vO
v©
fS
*
'<
VO
CN
b>
t*l
vi
OO
r*i
1^
r*l
•n
Fj
•^r
O
o ©
©
S
© © © ©
666
6
1
6
<N
IT,
»N
OO
rl
_,
o,
o •» t
«1
VI
fl
cc
r-»
v>
— 1
m
H
ffi
~c
>i
o\
Ov
r-1
in
■^- rr,
r«"i
r-1
n
O
o o o o
vn
I
O © O
►?
fi r- rs vi
M00*(N
•/I r* ci «i
o o o o
5400
5233
4358
3622
192 13
195 02
012 74
Ov r-
o O
O vo
o o
o o o o
© o ©
o o
1
r— — oo
m vo
VO <N
s
3 vO VD
O O
OO VI
OO fS
VO
OV
Tl
VO
VO
■<r
vS
I
o o
90
NUMERICAL METHODS
§4.3
lhan 0008 93, the smallest pivot in Ex. 4.2. The size of the
smallest pivot in Ex. 4.3 limits the accuracy of the roots to
at most 1 in 500, in general. The reason for the very small
final pivot in Ex. 4.3 is clear from (4.17) which states that
the product of the pivots must be a constant. If we insist
on choosing as pivot the largest number available at each
stage this tends to lead to smaller pivots than might other-
wise be obtained. The logical conclusion of this argument
might seem to be that the pivots should be chosen so that
they are all nearly equal to each other, and therefore each
Tablb 4.1
Comparison of solutions obtained by the method of
successive elimination
*f«(*l) »-W cle.
Top iefl pivots (Ex. 4.2) -00160 00087 00161 -00081
Largest pivots (Ex. 4.3) -00402 00213 00405 -00198
Top left pivots, afler -00098 00055 00099 -00052
scaling
Largest pivots, afler 00148 -00077 -00149 00071
scaling
pivot should be roughly equal to the /rth root of the deter-
minant. Apart from the fact that this is impractical since
the determinant is not known at the beginning of the
analysis it is not true that increasing the size of the smallest
pivot automatically increases the accuracy. It will be
shown in the next section that a certain inherent inaccuracy
is present in the roots of a set of simultaneous linear equa-
tions if these are found by any procedure involving rounding-
off errors. As long as the pivots are large enough so that
round-off in the smallest pivot does not imply a relative
error greater than the inherent error in the calculation, it
would seem that little additional accuracy can be obtained
by choosing the pivots in any special way. The essential
point is to avoid very small pivots rather than to arrange
§4.3
LINEAR EQUATIONS
91
that the smallest pivot is as large as possible. On the whole
the size of the pivots would seem to be of second-order
importance.
In §4.5 below, when programming the method of
successive elimination for automatic computers, we shall
pivot on the largest element in the left-hand column of
numbers in the array obtained after each elimination. This
avoids unnecessarily small pivots, and the multipliers are
all less than unity which may be convenient for some
computers.
The object of illustrating that the choice of pivots in
Ex. 4.2 gives more accurate results lhan the choice in Ex. 4.3
is not of course to imply that the choice of the top left-hand
number as pivot is to be preferred to the choice of the
largest pivot at each stage. Many of the equations which
occur in practice are such that, when calculating by hand,
they are suitable for solution by choosing the top left-hand
number as pivot even though this is not the largest possible
pivot. On the other hand if the top left-hand number is
appreciably smaller than the other numbers in an array
then it should certainly not be chosen as pivot.
Suppose next that we rescale the equations of Exs. 4.2,
4.3 by multiplying the third equation by two and the fourth
equation by four and introduce new unknowns
x\ = .v,, x' 2 = x a , x'i = i* 3 , *; = ].v 4 .
It will be found that if the pivots are chosen as the top
left-hand numbers in each array as in Ex. 4.2 the pivots are
0-5400, 0-28009, 0093 96, 0142 90.
If the pivots are chosen to be the largest numbers at each
stage as in Ex. 4.3 then the pivots are
70032, 0-744 65, 0155 67, 0-002 49.
The differences between the exact results and the results
92
NUMERICAL METHODS I
§4.4
given by the two methods are given in Table 4. 1 . It is seen
that although scaling has improved the results slightly the
improvement is not striking. The smallest pivot when
using the top left-hand pivots is now comparatively large,
but the smallest pivot obtained by taking the largest pivot
at each stage is almost exactly the same as in Ex. 4.3 before
rescaling.
The point about rescaling is exactly the same as the
point previously made about the choice of pivot. Rescaling
can be advantageous in avoiding very small pivots rather
than arranging that the smallest pivot should be as large
as possible. Once the smallest pivot is above a certain
minimum value there is little point in further rescaling.
In practice if it turns out that the smallest pivot is
undesirably small it is usually preferable to improve the
solution by the method described in the next section
(Ex. 4.4) rather than repeat the solution with a different
choice of pivots, rescaled equations, or more decimal places.
§ 4.4. Inherent error and ill-conditioned equations. This
section is concerned with a characteristic difficulty which
occurs when solving linear equations. We first consider a
concrete example:
1-985.Y, -l-358.v 2 = 2-212, (4.19)
0-953.y,-0-652.y 2 = 1062.
Elimination of .v, gives
(1 -985 x 0-652-0-953 x 1 -358).v,
= - (1-985 x 1 -062-0-953 x 2-212). (4-20)
On working to six decimal places we find
0-000 046.v 2 = -0000 034, x a = -0-739....
Similarly .v, = 0-609.... In quoting the answers to three
decimal places we have in fact assumed that the coefficients
§4.4
LINEAR EQUATIONS
93
in (4.19) are correct to seven decimal places and that we
have worked to seven decimal places so that x 2 = —340/460,
x t = 280/460. If after evaluating the quantities in brackets
in (4.20) we had to round off to five decimal places we
have .v 2 = -0000 03/0000 05 = -0-6..., and the answer
is not even correct to one decimal place. Round-off
imposes a fundamental limitation on the accuracy with
which the roots can be obtained.
If instead of (4.19) we solve
1-985*, -1-358j: 3 = 2-212,
0-953.v,-0-651v 2 = 1061,
we find .v, = 3001..., x 2 = 42-41.... An alteration of
one coefficient of (4.19) by one in a thousand (1062
changed to 1-061) has altered the roots by factors of
about fifty to one (0-61 to 30, and -0-74 to 42).
The reason for the behaviour of the above equation is
clear. The second equation is nearly a multiple of the first
so that we can write the equations algebraically as
ax l +bx 2 = c,
(ka+a)x l +(kb+P)x 2 = kc+y.
where A: is a suitable constant and a, p, y are small,
roots of these equations are
The
*i -
cp—by ca—ay
x 2 =
ap-bct
aP-bx
The small numbers a, p, y have a dominant effect on the
result. In computing determinants, large numbers have
cancelled by subtraction to give small final results. The
determinant of coefficients is small relative to the individual
terms in the expansion of the determinant. (Notice that it is
meaningless to say that the determinant is small — it has to
be small relative to some other quantity.)
94
NUMERICAL METHODS
§4.4
§4.4
LINEAR EQUATIONS
95
If X, are approximate roots of the set of linear equations
(4. 1 ) we define the residuals r, as
r t = Z a ij X J~ b i-
J" i
(4.21)
For equations (4.19), If we take X, = 0-638, X 2 = -0-696
we find
r, = -0-0004, r 2 = -00002.
Roots which differ from the true roots by 1 in 20 produce
residuals which arc less than the round-olT error of the
coefficients. This illustrates the important point that
smallness of the residuals does not necessarily mean that the
corresponding estimates of the roots are accurate.
Equations like (4.19) are said to be ill-conditioned. The
opposite of ill-conditioned is well-conditioned. We talk of
the " condition " or " conditioning " of a set of equations.
As already illustrated, characteristic signs of ill-
conditioning are:
(a) To determine the roots to a specified number of
decimal places when round-off error is present it
is necessary to work to a much larger number of
decimal places in the intermediate calculations.
(b) Small alterations in the coefficients produce large
variations in the roots.
(c) The determinant of the coefficients is small in the
sense described above.
(d) Estimates of the roots which differ appreciably from
the true roots produce very small residuals.
Two other indications of ill-conditioning depend on
ideas which will be introduced later. In terms of the inverse
matrix defined in § 5.3, equations are ill-conditioned if the
elements of the inverse matrix are large in the sense described
at the end of Ex. 5.6. In terms of the concept of an eigen-
value defined in Chapter VI, equations are ill-conditioned
if the ratio of the largest to smallest eigenvalue of the matrix
of coefficients is large.
The equations in Ex. 4.2 arc slightly ill-conditioned (see
also Ex. 5.6).
If the diagonal elements a,,, a 22 , ••• o m are much
greater than the non-diagonal elements a l} (i jkf) then the
equations are well-conditioned. If all a (j (/' /- j) are zero
then the equations are perfectly conditioned. If the
coefficients vary in size it is usually convenient to arrange,
by interchange of equations and/or scaling, that the largest
coefficients lie in or near the diagonal positions a, , , a 22 , . . . a„„.
It is useful to distinguish between two different aspects
of accuracy when solving linear equations.
(i) If the coefficients of the equations are assumed to
be exact but a fixed number of decimal places are
carried during the solution, how many decimal
places are accurate in the answer?
(ii) If the coefficients of the equation are inexact (for
example because they arc obtained by rounding
off more accurate numbers) to how many places
are we justified in quoting the final answer?
It is easy to answer both of these questions in any
specific case, provided that the detailed solution of the
original equations is available and we are prepared to do a
little extra work. Suppose that in case (i) the roots obtained
by working to a fixed number of decimal places are denoted
by Xj and we define residuals by (4.21) as before. Suppose
that the exact roots Xj are related to the approximate roots
by
Xj = Xj+S.Xj, (j = 1 to »).
On substituting in (4.1) we readily find that the 5xj are the
roots of the equations
t arfxj = h,- t "ijXj = -r„ (I = 1 to n). (4.22)
7 = I 7=1
96 NUMERICAL METHODS I § 4.4
The procedure is illustrated in the following example.
Ex. 4.4. Find the accuracy of the roots obtained in Ex. 4.2.
The residuals are found to be:
10 5 r, = 0010, 10 s r 2 = -5072,
10 5 r 3 = -2-512, 10 5 r 4 = -2-506.
The major part of the work involved in solving (4.22) has
already been carried out in the table in Ex. 4.2 and it is
necessary to calculate only one new 6, column. The results
are almost exactly those given in Table 4.1 :
<5x, = 00160, fa, = 00087,
8x 3 = 00161, fa, m -00080 5 .
Hence the roots in Ex. 4.2 are accurate to within one or two
digits in the second decimal place.
In order to answer the question in (ii) above suppose
that possible errors in a u , A, are denoted by da,,, db h and
suppose that these alter the roots from Xj to Xj + fay. Then
I (flu+Sorfixj+tej) - h i + Sb t-
j = »
On neglecting second-order terms we find
n n
V flyfa. = 5bi- £ f>a,jXj, (i = 1 to nr).
If \j = Xj+Ej, where the X } are approximate roots, we
obtain, again neglecting second-order terms,
£ auSx, = sb t - t 8°u x j = »i. sa y- ( 4 - 23 )
;= i ;■= i
In many cases we can regard a solution as satisfactory
if the error in the solution is comparable with the inherent
error due to inexact coefficients. From (4.22) and (4.23)
§4.4
LINEAR EQUATIONS
97
it is seen that a solution is satisfactory in this sense if the
residuals | r, | are less than some average value (such as the
root-mcan-square value) of the magnitude of the quantities
Pi deduced from the possible errors in coefficients. As an
example consider the results in Ex. 4.2. The maximum
residua] has magnitude 5.10" 5 (Ex. 4.4). The possible
errors in coefficients can produce a maximum error of
5.10
-5
x,
125.10
-5
1+ I
Statistically the error will be considerably less than this but
it is unlikely that it will be reduced by the factor of 25
required to make the average | v, \ comparable with the
largest residual. Hence the solution obtained in Ex. 4.2
is satisfactory in the sense defined at the beginning of this
paragraph. This type of result would seem to be the rule
rather than the exception when the method of successive
elimination is used, if the equations are scaled properly and
one or two guarding figures are kept in the body of the
calculation. The following example illustrates an empirical
approach to the problem.
Ex. 4.5. Alter the coefficients in the equations in Ex. 4.2
in the following way. Write the ten numbers — 4( I )5
on slips of paper and for each a,j and b t pick one of the
ten slips at random. Alter the corresponding coefficient
by \0~ i times the number on the slip unless this number
is 5, in which case alter the number by 5.10" 3 if the
end-digit is even and -5.10" 5 // the end-digit is odd.
Find the change produced in the roots (4. 1 8).
An experiment of this type gave
10 4 u, = -5-23, I0 4 i> 2 = 6-98,
10*b 3 = -4-60, I0 4 t> 4 = 4-46.
Sx, = -00852, 8x 2 = 00447,
«5x 3 = 00814, 8x 4 = 00398.
98
NUMERICAL METHODS
§4.5
The errors in the solution in Ex. 4.2 (see Ex. 4.4) are about
one-fifth of these.
We have now gone as far as we can, within the scope and
limitations of this book, in considering the errors involved
in solving simultaneous linear equations. An interesting
theoretical treatment from a comparatively elementary
point of view is given in Modern Computing Methods,
H.M. Stationery Office, 2nd Edn. (1961), Chapter 5.
§ 4.5. A computer program for the method of successive
elimination.-':- Program 4.1 is a computer program for
solving an n x n set of simultaneous linear equations by the
method of § 4.2. The unknowns are eliminated in the order
x lt x 2 , •■■ x„. The pivot at each stage is chosen as the largest
element in the first column of the appropriate array. When
the statement labelled 12 is reached, this largest element
has been determined and the appropriate rows have been
interchanged so that the row containing the largest pivot
is now the first row in the array from which the next
unknown is to be eliminated. When statements 16 and 17
are completed the appropriate unknowns have been
eliminated and the machine proceeds to eliminate the next
unknown. The remainder of the program is concerned
with back-substitution, computation of the determinant,
and printing out the results. (It is advisable to print out
some measure of the condition of the equations. The size
of the determinant and the smallest pivot give some indica-
tion of whether the equations are badly conditioned. In
order to save unnecessary printing it may be arranged that
these are printed only if they fall below certain limits.)
These hints should enable the reader to work through the
program in detail, as an exercise. This will be more in-
structive than pages of further explanation.
t Sec also Modern Computing Methods, H.M. Stationery Office,
2nd Edition (1961), Chapter 2.
§4.5
LINEAR EQUATIONS
99
Program 4.1
••Abel Statement
Read n, a u for /= 1(1) « and /- 1(1) n,
b i hTi= 1(1) „
Print n, a tJ for /= 1(1) n and /- 1(1)/;,
b, for i = 1(1) «,
Do up to 3 for i = 1(1) «
3 a i.n+i = b t
Do up to 17 for r = 1(1) n-]
q = r
4 p = q
5 If q = n go to 10
<7 = <7+l
6 If a pr <a v go to 4
Go to 5
10 Do up to 12 for y = r(l)/i + l
c l = a n
2 a Pl = Cj
16 Do up to 17 for/ = r + 1(1) nand
j = r+l(\)n + l
17 a n = a IJ -(a lr a rJ )/a„
X n = «,,.tl/«,,,
Do up to 22 for r = /j-l(-l)l
d=0
Doupto20fory = r + l(l)//
20 d = d+a rj xj
22 *, = («,.„ + . -</)/*„
D = a lt
Do up to 25 for i = 2(1) //
25 D = Da„
Print Xj for; = 1(1) n, a n for i = 1(1) n, D.
Stop
100
NUMERICAL METHODS
I4.5
Examples IV
(Examples for practice in numerical work can be
derived from those in the text by rescaling the equations in
Ex. 4.2 or by working the examples to a different number of
decimal places. Some of the Examples V are relevant to
this chapter.)
Ex. 4.6. Modify Program 4.1 so as to solve m sets of
equations with the same set of coefficients a,j but m different
right-hand sides b tk (k = 1 to in).
Ex. 4.7. Draw a flow-diagram for Program 4.1.
Ex. 4.8. Show that the number of multiplications
involved in solving n simultaneous linear equations in n
unknowns by the method of successive eliminations (§ 4.2)
is of the order of J/; 3 . (This is an important result because
the main labour in solving large sets of equations lies in the
multiplications, so that the labour increases rapidly with
the size of the set of equations. The same result holds for
the compact elimination method of § 5.2 below.)
Ex. 4.9. Consider the set of equations
(ijXj _ , + bj.\j + CjXj + , = (I j, (j = 2, 3, . . . n - 1),
a n x n - i + b n x n = f/„.
Show that the x, can be determined from the formulae
0i = -'.
b \
yj =
b j- a j9j-i
(/=2,3,...«-l),
, *', h^ A r a h-\ = 2,3,... M ),
»i bj-atfj-!
X. = K< *j = hj-gjX J+l , (j = B-l, m-2, ... 1).
H-S
LINEAR EQUATIONS
101
If aj<0, bj>0, Cj<0, and aj+bj+Cj^O, show that
— lg0y<O. (The number of operations involved is
proportional to n compared with a number of operations
proportional to n 3 for a general set of equations. The
method can be justified from first principles but it is
instructive to consider it from the point of view of the
matrix decomposition in Ex. 5.11. Equations of the above
type occur in connection with solution of the heat-conduc-
tion equation by implicit difference methods as in Ex. 11.2.)
CHAPT1R V
MATRIX METHODS
§5.1. Matrix algebra. Matrices provide a useful
method for systematising both the theoretical and the
practical aspects of certain computing procedures, par-
ticularly in connection with automatic computers. We
start Chapters V and VI with short summaries of the
necessary theory,! to provide a guide for the beginner who
might otherwise be confused by the mass of detail in text-
books on matrices.
An m xn matrix A is simply a rectangular array of nm
numbers, arranged in in rows and // columns. It is said to
be of order " in by n " or " m xn":
A = [a,,] =
"I! "12
a 2l a 21
"ml
«1«
«2n
The (/,y')th element a (i represents the element in the /th
row and they'lh column of the matrix.
Consider operations involving two matrices A = [a,j]
and B = [by]. Equality, addition, and subtraction are
meaningful terms if and only if the matrices have the same
number of rows and the same number of columns. If this
t Furlhcr information can be found in A. C. Ailken, Determinants
and Man ices, Oliver and Boyd (1962).
102
§5.1
MATRIX METHODS
103
is true then:
A = B if and only if a,] = b t] for all /, _/*.
A±B**\ft lt ±b tl l
This last equation means that the sum or difference of two
matrices with equal numbers of rows and columns is the
matrix such that any element is the sum or difference of
the corresponding elements in A and B.
To multiply a matrix by a scalar, say k, each term of the
matrix is multiplied by k:
kA = [ka tJ ].
A matrix with only one row is called a row matrix (or a
row vector) and a matrix with only one column is called a
column matrix (or a column vector). The scalar product of
a row matrix and a column matrix, taken in that order,
is meaningful if and only if the row and column have the
same number of terms, and then it consists of a single
number defined as in the following example:
[X, x 2 ... x n ]
.'•,
= xiyi+x 2 yi+-+xj'm
Two matrices can be multiplied together if and only if
the number of columns in the first is equal to the number of
rows in the second. Then the element in the ith row and
the yth column of the product is the scalar product of the
fth row of the first matrix with they'th column of the second.
If A is m x n, B is n x p, and
AB = C,
or
then the rule determining c, k , the elements of the product
104
matrix C, is
NUMERICAL METHODS I
c ik = £ "ijbjit,
§S.1
and C is m xp. As an example,
a,, fl, 2
021 a 22
L.031 *32 J
_*21 *22J
0|1*11+0|2*2I. «H*I2+«U*M
021*11+022*21. «21*l2+«22*2 2
L«31*l 1+032*21. 031*12 + fl 32*22J
The order in which matrices are multiplied together is
obviously very important. The existence of the product
AB is quite independent of the existence of BA and even if
AB and BA both exist they will in general be unequal and
there is no very useful connection between the two. In the
product AB the matrix A is said to premultiply B, and B is
said to postmultiply A.
Although the commutative law does not hold for
multiplication {AB ^ BA in general) the distributive and
associative laws are true, e.g.
(A + B)C = AC+ BC, A(BC) = (AB)C = ABC.
It is useful to define certain special types of matrix.
A matrix with the same number of rows and columns is
called a square matrix. Associated with any square matrix
A we have the determinant of the matrix:
det/f =
'11
<nl
'In
If A and B are square matrices of the same order then
(Aitken, loc. cit., p. 80)
det AB = del A . det B. (5.1)
A diagonal matrix D = [d,j] is a square matrix with all
the d,j zero except </,,, d 21 , ... d nn which are the elements
5S.2
MATRIX METHODS
105
along the principal diagonal, or the diagonal elements. A
diagonal matrix all of whose diagonal elements are unity
is called a unit matrix, and it is denoted by /. Clearly
AI = IA = A, det 7=1.
In matrix algebra / plays the same role as unity in ordinary
algebra. The transpose A' of a matrix A is defined by the
property that if A is an mxn matrix whose (/,y)th element
is a,j then A' is an nxm matrix whose (**, y)th element is
a jh i.e. [a,j]' = [fly,]. A symmetrical matrix is a square
matrix with fl,y = fly,, i.e. A' = A.
§ 5.2. A compact elimination method for the solution of
linear equations. We describe a compact elimination method
for the solution of linear equations which is essentially a
telescoped version of a successive elimination method (see
§ 4.2 and Ex. 5.4). It is convenient to work in terms of
matrices. We consider only 3x3 matrices when describing
the theory, but the extension to the general case is direct.
Lower and upper triangular matrices /.. U are defined as
matrices with zero elements above and below the principal
diagonal, respectively:
L =
'..
01
. u =
[".1
"12
'2.
hi
«22
hx
ht
'33.
'1.1
•aa
'33.
The set of linear simultaneous equations (4.1), for the
case n = 3, can be expressed in matrix notation as
where
A =
Mi
'21
-"31
012
022
032
Ax = h
13
, X =
r*ii
. * =
r*.i
23
*2
*2
33.
*3.
*3
106
NUMERICAL METHODS
§5.2
We first of all show that A can be expressed as the product
of two matrices in the form A = LU where L is a lower
triangular matrix as defined above and U is an upper
triangular matrix with units along the principal diagonal
(i.e. u„ = 1). (The diagonal elements of cither L or U, but
not of both, can be chosen arbitrarily.) Wc have
«ll
«13
"is]
=
[hi
01
ri
"12
"ti
«2l
a 22
«23
lit
hi
i
"23
*31
"32
"33.
.'31
hi
»33.
1
/,, / U U, 2 / U H 13
'21 '2l"l2 + '22 '2l"l3 + '22"23
.'31 '3l"l2 + '32 '3l"l3 + '32«23 + '33.
On equating the individual elements in the first and last
matrices we obtain equations which determine the elements
of L and U. It is found that the elements of L and U can
be determined as follows:
(a) The first column of L:
In = "ii> '21 - *au 'i\ = «si«
(b) The first row of U:
(c) The second column of £.:
^22 = «22-/2l"l2. fa " «32-/3l"l2-
(d) The second row of U:
"23 = (.a 23 -I 2 l"t3)lll2-
(e) The third column of L:
'33 = fl33- / 3l''l3-'32''23-
§5-2
MATRIX METHODS
107
The importance of doing the computations in the above
order is that the quantities required to compute the value of
an clement at any stage of the calculation have already been
computed at some previous stage. In hand computation it
is of course important to evaluate sums of products in one
machine operation, without intermediate recording.
Assuming that L, t/are known, we can write Ax = b as
LUx = b.
We introduce y defined by
y = Ux.
Then (5.2) gives
Ly = *.
Written out in full, equations (5.3a, b) are:
'2I.Vl + '22>'2 - l>i
iiiyi+iiiyi+hsyz = b 3
(5.2)
(5.3a)
(5.3b)
(5.4)
Xi+«ia*a+«i3*a = .Vi
*a+»«3*a ■ y% ■'
*3 = ^3
The values of y lt y 2 , y 3 can be computed in succession from
the first set of equations and then .v,, x 2 , x 3 can be com-
puted in succession from the second set. If we solve the
first set of equations in (5.4) for the y t wc sec that
yi = hflut
y2 = <6a-/aitt)//aa, (5.5)
)'3 =(*3-/3!>'l-/32>'2)//33-
Before considering the organisation of the calculation
we show that numerical checks can be obtained in the
following way. We first of all define the row sums
S 3 = a 2l +a 22 +a 2i +b 2 ,
•*3 " «31 + fl 32 + fl 33 + '»3-
ioe
NUMERICAL METHODS I
§5.2
A little reflection shows that if the b, on the right of the
first set of equations in (5.4) are replaced by s it then the y,
on the right of the second set are replaced by the quantities
§5.2
MATRIX METHODS
109
I, = l+u l2 + u l3 +y„
S 2 = l+H 2 , + )' 2 ,
I 3 = l+y 3 .
(5.6a)
The same quantities are obtained if we follow the procedure
used to derive the y t :
"i ■ (ft— ki ff dlhn
o 3 *>iH-hiPt-h&iMhs>
(5.6b)
Algebraically the a, and I, represent identical quantities
but in numerical work they are computed in different ways
so that the numerical values may differ slightly in the last
significant figure due to round-off error. The approximate
equality of the a, and S, serves as an independent check on
the calculation.
We now describe a practical procedure for carrying out
the above method. f The symbols have been defined above
except for the column check-sums (cf. §4.2):
S, = a n + a 2l +a 3i (/ = 1, 2, 3), ,9 4 - ftffifc+fc,
S m S t + S 2 + S 3 + S A , s = s t +s 2 + s 3 .
It is convenient to divide the calculation into three stages.
t P. D. Crout, Trans. Amer. Inst. Electr. Engrs. 60 (194!), 1235-40.
Another layout is given in Modern Computing Methods, H.M.
Stationery Office, 2nd Edition (1961), Chapter 1. Which of these
layouts is recommended depends on personal preference.
(i) Write down the original matrix with check sums:
a w fl t2 a 13 fc, faj)
a 2l a 22 a 23 b 2 (s 2 )
"31 <»32 "33 ''3 fo)
(S.) (S,) (S 3 ) (S 4 ) (S = s).
(ii) Write down an auxiliary matrix. The method of
deriving this matrix is given below. It may be
noted at this point that the %, y h and a, are all
determined by following exactly the same rule
(cf. (b), (d), (5.5), and (5.6b) above, and the rule
given in the paragraph preceding Ex. 5.1 below).
'.,
"12
"13
.Vi
(oi * S.)
I«
hi
"2 3
y2
(o 2 * S 2 )
f M
hi
hs
y 3
(ff, * Sa).
(iii) Obtain the unknowns by back-substitution in the
second set of equations in (5.4) and apply the
final check:
S 4 = S t Xi + S 2 x 2 + S 3 x 3 WS 4 .
The elements of the auxiliary matrix are written down in
the following order:
(a) Elements of the first column, /„.
(b) Elements of the first row to the right of the diagonal,
M iy. >"ii ff i- Then apply a check :
I, = l+Uu+Uu+y^ffy.
(c) Elements of the second column on and below the
diagonal, l l2 .
(d) Elements of the second row to the right of the
diagonal, u 2J , y 2 , a 2 . (Notice that S, not o~, is
110
NUMERICAL METHODS
§5.2
used to compute <r 2 (cf. Ex. 4.2).) Then apply
the check:
£ 2 = 1+ "23 +J>a Ka i-
(e) Elements of the third column on and below the
diagonal, / 33 .
(f) Elements of the third row to the right of the diagonal,
y 3 , ff 3 . (Note that I,, I 2 not a,, a 2 are used to
compute <r 3 .) Then apply the check:
Z 3 = l+>"3 ft ff3-
The elements of the auxiliary matrix are determined by
the following general rule: Each element on or below the
diagonal is equal to the corresponding element in the
original matrix minus the sum of products of elements in
its row with corresponding elements in its column, in the
auxiliary matrix, where these products involve only pre-
viously computed elements. Each element to the right of
the diagonal is obtained by exactly the same rule except
that finally we divide by the diagonal term in the same row.
This last part of the rule applies to the My, ;'„ and a, but it
does not of course apply to the checks I, which are defined
as in (5.6a). If the matrix A is symmetrical, labour can be
saved by noting that each element My of U is equal to the
symmetrically situated element l J{ of L divided by the
diagonal element /,,-.
Ex. 5.1. Apply the above method to the equations:
.Y,+4.Y 2 + x 3 + 3x 4 - 1.
- .x 2 + 3.v 3 + .v 4 = - 4,
(5.7)
3.v,+ .y 2 + 6.y 3 -10.v 4 = -11,
-Y,-2.Y 2 + 5.Y 3 = 1.
§5.2
MATRIX METHODS
111
Show that the auxiliary matrix, including check columns,
is
1 4 1 3 1 (10)
-1 -3 -1 4 (1)
3 -11 -30 1 -1 (1)
1-6-14 5 2 (3),
and deduce the solution
x, = 10, x 2 = -3, Xj = -3, x 4 = 2.
Ex. 5.2. Solve the equations of Ex. 4.2 by the above method
The auxiliary matrix is (omitting the check column)
0-5400 0-969 07 0-807 04 0-670 74 2-410 19
0-5233 0-28009 —0-214 66 0-623 73 -3-640 80
0-4358 -0060 12 0023 49 1-808 86 0057 65
0-3622 0174 70 004249 0008 93 -6-357 51
Roots: .v, = -5-3717, .Yi-2-8055, * 3 =ll-5575, .v 4 - -6-3575.
Errors: Sxi=— 0-0151. 8.v 2 =0-0082, 8*3= 00154, Sx 4 = -00078.
It is seen that the errors arc almost exactly the same as those
obtained by the method of successive elimination in § 4.2
(see Table 4.1). Theoretically we should expect the compact
method to be slightly more accurate than the method of
successive elimination since there arc fewer round -off errors.
Ex. 5.3. Show that the compact method gives the following
results for scaled equations in Ex. 4.3:
Roots: .n=— 5-3715, * 2 = 2-8054, a- 3 = 11-5575, .v 4 - -6-3574.
Errors: 8ai=— 00153, S.v 2 0-0083, S.v 3 -=0-0I54, 8x4= -00079.
Ex. 5.4. Derive the compact method corresponding to the
method of successive elimination o/§ 4.2, in the following
way, from equations (4.4)-(4.10).
We have e 4 = </ 4 -/ 43 f/ 3 . In this expression substitute
112
NUMERICAL METHODS I
§5.3
8 5.3
MATRIX METHODS
113
for d A in terms of c A , and then for r 4 in terms of b 4 to find
e t = b A -U x b i -U 1 c 2 -l 4i d i .
Similarly show that all the terms in (4.5), (4.8), (4.10) can
be expressed in terms of quantities which occur in these
equations, the original matrix, and the l l} . Show that a
similar result holds for the l- tJ . Hence it is unnecessary to
record the quantities in the square brackets in (4.6), (4.7),
(4.9). Show that the resulting formulae are in fact exactly
those involved in solving Ax = b by decomposing A in the
form A = LU, where L has units along the principal
diagonal.
In the compact method described above we have
arranged that it is U which has units along the diagonal
instead of L. Show that this is a condensed form of the
method of successive elimination in which, for instance,
we divide the row of quantities a ti , b lt s { in (4.4) by a lt
instead of dividing the column a n by «,, and similarly for
the other arrays.
There are various methods of choosing the diagonal
elements in L and U, and arranging the methods of successive
elimination and compact elimination. The precise variant
which is chosen is largely a matter of personal preference.
§ 5.3. The inverse matrix. Suppose that, given an n x n
matrix A, there exists a matrix Z such that
AZ = I.
Then it is possible to show that ZA = I. The matrix Z
is called the inverse of A and it is usually denoted hy^~*.
When A' 1 exists we have
A~ l A = AA~ l = /.
From this relation, using (5.1), we have
det A. det A~ l = 1.
(5.8)
Hence a necessary condition for the existence of the inverse
of A is that det A j= 0. If this is true then the matrix A is
said to be non-singular, but if det A = the matrix A is
said to be singular and in this case the inverse does not
exist.
Instead of dividing by matrices we multiply by the
inverse matrix. As an example consider the simultaneous
linear equations (4.1) which in matrix notation are
Ax = b.
(5.9)
We assume that the determinant of the coefficients is
non-zero so that a unique solution of the equations exists,
and the inverse matrix also exists. To solve (5.9) we wish
to remove A from the left-hand side. Premultiply both
sides by A~':
A~ l Ax = A-'b.
On using (5.8) we have
x = A~'b.
(5.10)
One practical procedure for the determination of the
inverse matrix is obvious from its definition. Denote the
yth column of A' 1 by z } and let dj denote the nx 1 column
matrix whose elements are zero except for the jlh clement
which is unity. Then AA~ l = /gives
Azj = dj, (J= 1,2, ...«)•
These are just n sets of simultaneous linear equations with n
different right-hand sides of simple form, but the same
coefficient matrix on the left. They can be solved by
extending any method for solution of a single set of equa-
tions. As an example consider the extension of the compact
method of § 5.2. The " original matrix " will appear as
114
NUMERICAL METHODS I
§5.3
§5.3
MATRIX METHODS
115
follows :
«u
a i2
Ol3
10
<*l)
a 2l
«22
«23
10
fe)
031
«32
«33
10
&)
(5.11)
(Si) (S 2 ) (S 3 ) (10) (10) (10) (S - s).
There is still only one check column, obtained by adding
all the elements in the corresponding rows. The " auxiliary
matrix " will be of the form
111
'2.
"12 "13
'22 "23
in
11
'"21
l»31
m 22
»'32
'"33
(a, m I,)
(<7 2 « Z 2 )
'31 '32 '33 '"31 '"32 '"33 (C3 * 2 3 ).
In the position occupied by the unit matrix I'm the original
matrix there is a lower triangular matrix with elements m tj .
The calculation can be completed by forming a back-
substitution matrix of the following form:
1 »12 "13 A \\ A \2 *13
1 u 23 A 2i A 22 A 23
I Am A 32 A 33 .
The A,j are the required elements of A''. Checks are
given by
S i A, J +S 2 A 2J +S 3 A 3J =1, 0=1, 2, 3).
The square matrix on the left of the back-substitution
matrix is simply the upper triangular matrix V. The
elements A u are obtained in the order: last row from left
to right, second last row from left to right, and so on.
The last row is the same as the corresponding part of the
last row in the auxiliary matrix. The second last row is
given by the formula
A 2J = m 2j -u 23 A 3J , (J = 1, 2, 3),
where of course m 23 is taken to be zero. The general rule
for the element A,j is: Any clement is given by the cor-
responding clement in the auxiliary matrix less the sum of
products of previously computed elements in its column
and corresponding elements in its row in the U matrix.
The /th element in its column is taken with the ;th element
in the row of the U matrix.
The method extends directly to the general nx.n case.
It is seen that the amount of labour involved in finding an
inverse matrix is about three times the work involved in
solving a single set of equations, independent of the value
of/;.
Ex. 5.5. Find A' 1 for the matrix A of the equal ions in
Ex. 4.2.
Part of the auxiliary matrix has already been given in
Ex. 5.2.
A' l =
399-8
-213-3
-401-7
198-2
-213-3
120-2
2141
-113-3
-401-7
2141
409
-202-6
198-2
-113-3
-202-6
1120
Ex. 5.6. Solve the equations in Ex. 4.2 by the formula
x = A~'h, using the inverse matrix found in Ex. 5.5.
Roots: -vi -5-3822, a- 2 = 2-8157, A- 3 =lI-5627,^=- -6-3607.
Errors: &vi= -00046, Sx 2 - -00020, S.v 3 =00102, S.v<= -00046.
Note that in this example the individual terms of A~ l
are large compared with the final roots. This indicates that
the equations are ill-conditioned since in the sum of products
required to form the roots large terms cancel to give small
end-results. On the other hand the degree of ill-conditioning
is not excessive.
We make the following comments in connection with
the discussion of pivots and scaling in §4.3. The elements of
116
NUMERICAL METHODS I
§5.3
§5.3
MATRIX METHODS
117
A~ l in Ex. 5.5 are comparable with each other in size.
Since we arc using fixed-point working and the elements of
A are comparable this means that the errors of the elements
of A' 1 should be comparable. If there is considerable
variation in the sizes of the largest elements in the rows
(or columns) of an inverse we have to be careful that we are
not losing accuracy unnecessarily in the sense that the
accuracy of the inverse could be improved by scaling the
original matrix. In the method described above for in-
verting a matrix we can obtain various choices of pivots by
rearranging the rows and columns of the original matrix.
Whatever choice of pivots is used the inverse of the last is
always an element of the inverse matrix. If the last pivot
is small then at least one clement of the inverse matrix is
large, and the equations are to this extent ill-conditioned.
This is one way of seeing that the size of the last pivot has
a second-order effect on accuracy.
The condition of the equations Ax = b depends on the
matrix A and not on b. The equations are ill-conditioned
if, when we check that A' 1 A is equal to /, then terms large
compared with unity have cancelled to give (approximately)
units along the principal diagonal and zeros elsewhere.
If B is a matrix such that BAsal then we say that B
is an approximate left-inverse of A, and similarly if ACxl
the matrix C is an approximate right-inverse of A. Due to
ill-conditioning, even if ACxI the product CA may not be
a good approximation to /. As an example consider
A =
Then
ri-oo looi, c = r-89 ioo~i.
|_100 0-99J L 90 - 100 J
= no oo~i, cA = r n 101.
|_oi ioJ |_- 10 ~ 9 J
A good right-inverse may not be a good left-inverse, and
conversely. The situation in numerical work is quite
different from the situation in the algebraic theory where
the left- and right-inverses arc the same.
We conclude this chapter by expressing in terms of
matrices the discussion at the end of §4.4 concerning
accuracy when solving linear equations. We define a
column matrix of residues, r, by the equation
r = AX-b,
where A' is an approximate set of roots. If the exact roots
x are related to the approximate roots by the equation
x = X+ 8x,
where 5x is a column matrix of errors, we readily find
A8x = -r, i.e. 5x = -A' l r. (5.12)
This is the matrix form of (4.22).
If errors in the coefficients are given by
8A = [Sa.j], 5b = [5b,],
and the corresponding new roots are denoted by x + 5x,
we have
(A + SA)(x+8x) = b + 8b.
On neglecting the second-order term SASx we find
8x = A- , (Sb-5Ax). (5.13)
This is the matrix form of (4.23).
When using (5.12) it is essential to evaluate the residuals
r to the number of significant figures required in Sx, and
since the residuals are small this may mean using a large
number of decimal places when calculating the residuals.
On the other hand A~ ' need only be known to the number
of significant figures required in 8x, which may mean that
only an approximate determination of A~ x is required. If
an approximate value of A' 1 is known then the roots of
Ax = b can be evaluated as accurately as required by the
118
iteration
NUMERICAL METHODS
§5.3
x — \" )approi ">
*«+i = x,-(A- l ) tnm r„ (s = 0, 1, 2, ...), (5.14)
where x s is the sth approximation to x. The residuals must
be evaluated with the appropriate accuracy at each stage.
This is an illustration of the principle that in an iterative
procedure it is often possible to use approximations in
certain parts of the computation, provided that the requisite
accuracy is maintained in certain vital parts of the calcula-
tion.
Ex. 5.7. Check the results of Exs. 4.4, 4.5 by using the
inverse matrix in Ex. 5.5 in conjunction with (5.12),
(5.13).
When working with simultaneous linear equations it
often happens, as in Ex. 5.7 compared with Exs. 4.4, 4.5,
that we can either evaluate z = A~ x d if A~' is available,
or we can solve Az = d where the major part of the work
has already been done since another set of equations Ax = b
has already been solved. To some extent the choice is a
matter of personal preference. In hand computing it may
be preferable to work with A' 1 since multiplication by A' 1
is rather more straightforward than adding an extra column
to an auxiliary matrix and back-substituting. Also the
size of the terms in A~ x gives some indication of the
condition of the matrix. When using a computer it may be
preferable to solve an extra set of equations, but this of
course depends on what is stored inside the machine.
Examples V
Ex. 5.8. Write a computer program (or a flow-
diagram) for the solution of linear equations by the compact
method of § 5.2.
§5.3
MATRIX METHODS
119
Ex. 5.9. Write a computer program (or a flow-diagram)
for the inversion of a matrix using
(i) the compact method explained in connection with
(5.11),
(ii) the successive elimination method of § 4.2 (compare
Program 4.1).
Ex. 5.10. A compact method which is sometimes
useful when it is desired to avoid small pivots is the square-
root method in which we choose /„ = u„ in the decomposi-
tion A = LU. Work out the details and show that when A
is symmetrical U is equal to the transpose of L. Apply the
method to the example in Ex. 4.2 and show that the errors
are about two-thirds of those by the method of Ex. 4.2
(Table 4.1). (The labour involved in taking square roots is
justified only if otherwise the pivots become undesirably
small in the sense discussed in §4.3.)
Ex. 5.11. Show that the band or tridiagonal matrix A
defined by
a,j = for | i-j \ > 1
can be decomposed in the form LU where L is a lower
diagonal matrix with /, ; = for i—j> 1 and U is an upper
diagonal matrix with « (i = 1
Find L and U for
and Ujj = for /— /> 1 .
r 2
-1
0"
-i
2
-1
-1
2
-1
-1
2
A =
Consider the procedure in Ex. 4.9 from the point of view
of the compact elimination method of § 5.2.
Ex. 5.12. Find the exact inverse of the matrix A given
at the end of Ex. 5.11, using the compact method explained
120
NUMERICAL METHODS I
§5.3
in connection with (5.11), expressing all non-integral
numbers as fractions instead of using decimals.
Ex. 5.13. Show that (AB)' = BA', and {AB)~ i
= B i A~ 1 if both inverses exist.
Ex. 5.14. Evaluate det A, the determinant of the
coefficients a tj , for the example in Ex. 4.2, to four significant
figures. (The pivots in Ex. 4.2 where five decimals were
carried in the intermediate calculations give det AkO-3173
x 10~ 4 . A similar calculation carrying six decimals gives
det Ax0-3l67x 10 -4 . Keeping one extra decimal place
does not mean that the error is reduced by exactly one
decimal place, since round-off errors occur at random.
But the results indicate that the last result is correct to
one digit in the fourth significant figure. How could the
accuracy be checked if you were using a digital computer?)
Ex. 5.15. Suppose that we arc given in pairs of points
(xj, yj) and it is required to represent the numbers y t
approximately by a relation of the form y s «f(xj) where
Ax) = -,/ 1 (-v) + z 2 / 2 (j:)+... + r n / n (.v), (/,<»,),
where the/,(.v) are known functions and the z, are constants
which are to be determined. This procedure is known as
curve-fitting. We set
i
*» = X z ifix,)-y s , (.v = 1 to m),
i= i
and determine the z, by the principle of least squares. We
minimise
Partial differentiation with respect to Zj(j = 1 to n) gives n
equations in /; unknowns which can be written, in matrix
§5.3
MATRIX METHODS
121
form,
A' Ax = A'y,
where
a = K-] = [f/xdl t = M, y - [>'<].
and A is m x «, z is n x 1, y is m x 1 .
These equations can be badly conditioned, especially if
the points (pc Jt y t ) tend to group in clusters. In such cases
it may be advisable to replace a cluster of say p points by a
single point at the centre of gravity of the cluster, repeated
p times.
Fit a quadratic to the following points:
0-52 1-22 206 3-47 400 4-60
2-90 1-72 1-20 200 2-61 405
yj
Ex. 5.16. Show that the inverse of the matrix of
coefficients of the equations (5.7) is
"27 -271 -19 1801
21 67 13 -60
3 81 9 -30
12 -26 -14 30
<4-'=Tk
Ex. 5.17. The effect on the solution of a set of simul-
taneous linear equations Ax = b of modifying one of the
coefficients a i} can be analysed in the following way.
Suppose that we know A~ l = [Ay] and we wish to solve
(A + D)x = b where D is an nxn matrix with typical
element d,j such that d iS is zero unless / = r,j = s, i.e. cl ri is
the only non-zero element of D. Then
(I+A- , D)x = A-'b.
If x* is the solution of Ax* = b show that
x.-
x*
l+^A
Xj = Xj-A Jt d rs x s , (j * s).
122
NUMERICAL METHODS
§5.3
Show that the solution of (5.7) with the second equation
replaced by
-x 2 + 2x 3 +x 4 = -4
is given by
23.v, = 501, 23.v 2 = -136, 23v 3 m -150, 23v 4 = 72.
CHAPTFR VI
EIGENVALUES AND EIGENVECTORS
§6.1. Introduction. Many of the applications of
matrices involve eigenvalues and eigenvectors. To intro-
duce the ideas which are involved we discuss a simple
physical example. Further applications occur, for instance,
in connection with the principal axes of a quadric, and the
principal moments of inertia of a solid body.f The reader
who does not find physical examples illuminating can start
with the mathematical definition (6.3) below.
Consider three particles, each of mass m, at equal
distances / along an elastic string of length 41, where the
string is fixed at both ends. If the particles execute small
transverse vibrations under no external forces, the equations
of motion are
in
in
m
d 2 X x
dl 2
d 2 X 2 _
dl 2
d 2 X-
_ TX t T(X 2 -X X )
T(X 2 +X,) T(X 3 - X 2 )
dl 2
1 _ _
T(X 3 -X 2 ) _ TX 3
I I '
t See W. H. McCrea, Analytical Geometry of Three Dimensions,
Oliver and Boyd (1960) and D. E. Rutherford, Classical Mechanics,
Oliver and Boyd (1960). For mathematical properties or eigenvalues
and eigenvectors (Latent roots and vectors) sec A. C. Aitken, Deter-
minants and Matrices, Oliver and Boyd (1962).
123
124
NUMERICAL METHODS
§6.1
where X u X 2 , X 3 are the displacements of the three
particles and T is the tension in the string. If all quantities
vary sinusoidally with time we can set
X, = x,e"°<, (r = 1, 2, 3).
Then the above equations become
(2-;.).v,- x 2 =0,
-.v,+(2-A).y 2 - .v 3 = 0, (6.1)
-.v 2 + (2-A).v 3 = 0,
where A = co 2 ml/T. This is a homogeneous set of simul-
taneous linear algebraic equations which in general possesses
only the trivial solution x, = x 2 = .v 3 = 0. However,
non-zero solutions will exist for special values of A. From
Cramer's rule (4.2) for the solution of linear equations in
terms of the ratios of determinants we deduce that a necessary
condition for the existence of a non-zero solution of (6.1)
is that the determinant of the coefficients is zero:
2-A -1
■1 2-A -1
o -l 2-;.
= 0.
(6.2)
By expanding this determinant we obtain a cubic equation
in A which can be solved to give the three special values
of A for which non-zero solutions of (6.1) exist, namely
A = 2-^2, 2, 2 + V2.
If in (6.1) we set A = 2—^2 then we find that the resulting
equations have a solution x, = C, x 2 = Cyj2, x 3 = C,
where C is any arbitrary constant. The physical meaning
of this is that corresponding to A = 2— N /2 there is a free
vibration of angular frequency io given by co 2 = 7(2 — N /2)/w/,
and corresponding to this frequency of vibration there is a
mode of oscillation such that the ratios x t : x 2 : x 3 are given
§6.1
EIGENVALUES AND EIGENVECTORS
125
by 1:^/2:1. Similarly
for A = 2, x,:x 2 :x 3 = 1:0:— I,
forA = 2 + 72, x,:x 2 :x 3 = I: -y/2:l.
This analysis shows that there are three and only three
frequencies of free vibration. Corresponding to each of
these frequencies there exists a solution of equations (6.1)
which gives the corresponding mode of oscillation. The
modes for this particular case are shown graphically in
Fig. 6. 1 . The reader should visualise these modes physically.
We now express these ideas in more precise mathematical
terminology. If A is an n x n matrix then, generalising (6. 1),
we consider the equations
04-A/)x = 0, or Ax = Xx. (6.3)
From Cramer's rule (4.2) these equations possess only the
trivial solution x = unless the determinant of the co-
efficients is zero:
det(.4-A/) =
fl,,-A fl l2
«2I «22-' 1 ••■
a
.,1
>n2
"in
fl„„-A
= 0. (6.4)
This is a polynomial equation in A. The roots of this
algebraic equation are special values of A for which the
simultaneous equations (6.3) possess non-zero solutions.
They are called the eigenvalues of the matrix A and will be
denoted by A,- (1 = 1 to n). The Aj are sometimes called
latent roots, characteristic roots, or proper values. Cor-
responding to each of the ;., there will be a solution of (6.3)
of the form Or, where x, is a non-zero vector and C is an
arbitrary constant. These solutions are called the eigen-
vectors (or latent vectors, characteristic vectors, proper
vectors). We have
Ax x = A,*,. (6.5)
126
NUMERICAL METHODS I
§6.1
Wc now deduce some properties of the eigenvalues and
eigenvectors for the special case where the a ti are real and
^,=2-^2
^3 = 2+72
A, A j A-j
Fig. 6.1 Modes of vibration of three panicles on a string
A is symmetrical. We first of all show that then the eigen-
values and eigenvectors are real. In the general case,
since the ?., are roots of an algebraic equation, the eigen-
values may be complex and therefore the elements of x,
§6.1
EIGENVALUES AND EIGENVECTORS
127
may be complex. Denote by x, a vector whose elements
are the complex conjugates of those of x,. The transpose
x J of x, is a row vector whose pth element is the />th element
of the column vector x,. Multiplication of (6.5) by x\ gives:
X, = x',4x,/x",x,.
(6.6)
(We adopt the convention that if y, z are n x 1 column
vectors with elements ;>,, z t , (;' = 1 to »), then y'z denotes a
scalar defined by
y'z - ?iif+JUb+.«.+Jfti
the inner product of y and z. Strictly speaking y'z is a 1 x 1
matrix and the inner product should be denoted by some
special symbol such as x.y or (x, y) but no confusion will
arise from our usage.) On writing x',x ; and x'lAx-, in full
it is seen, on using the assumption that A is symmetrical,
that both of these expressions are real. Hence the X, are
real. Corresponding to each of the /.,, equations (6.3) will
possess a solution of the form Cx, where the elements of
x, are real. From now on we can therefore drop the bars.
If we multiply (6.5) by x}, 0' ^ /), write down the
corresponding expression with / and j interchanged, and
subtract the two expressions, we obtain, since Xj<4x, =x\Ax l
and x'jx, = x,x„
(;.,-;», = 0, (i*j).
We shall assume that the eigenvalues are distinct, i.e.
A, / /,. Then
x.'x ( = 0, (i*j).
(6.7)
If this equation is satisfied the vectors x ( and x ; are said to
be orthogonal and the eigenvectors x, form an orthogonal set
of vectors. Since the elements of an eigenvector are derived
from homogeneous equations these elements can be multi-
plied by an arbitrary constant. It is usually convenient to
128
NUMERICAL METHODS
§6.1
adjust the size of the elements by some standard rule. This
procedure is called normalisation and it can be done in
various ways. We shall use two kinds of normalisation in
this chapter:
(i) The eigenvectors can be normalised so that the
numerically largest element of each eigenvector is
unity,
(ii) An eigenvector x, with elements d,, d 2 , ... d„ can
be normalised so that
x\ Xl = d] + d\ + ... + dl=\.
In this case we say that the eigenvectors are
normalised to have unit length.
If a set of vectors is such that it is impossible to find
constants a,, not all of which are zero, such that
a,x,+a 2 x 2 + ...+a fl x ll = 0,
(6.8)
then the vectors are said to be linearly independent. A
useful deduction from this definition is that if a relation of
the form (6.8) holds between linearly independent vectors
then a, = a, = ... = a„ = 0. Orthogonal vectors are
linearly independent for, if a relation of the form (6.8)
exists, we see on multiplying by x', that a, = and this
holds for all ;'.
An arbitrary n x 1 vector u can be expressed uniquely as
a sum of multiples of n orthogonal «x 1 vectors:
u = a i x x +a 1 x 2 + ... + a n x„
(6.9)
where the a, are suitable constants. For if the a, are regarded
as unknowns then (6.9) is an n x n set of simultaneous linear
equations for the a,. The determinant of the coefficients
is non-zero since otherwise the sum of certain multiples of
the columns of the determinant would be zero, i.e. (6.8)
would be true, but we have just shown that this is impossible.
§6.2
EIGENVALUES AND EIGENVECTORS
129
Hence the determinant of the coefficients is non-zero, and
values for the a, exist and arc unique. In fact we can easily
write down the solution explicitly since multiplication of
(6.9) by x\ gives
a, = x'fl/x'iX,.
F.x. 6.1. From the results for the problem discussed at the
beginning of § 6. 1 deduce that the matrix
A - r 2 - 1 0]
-1 2-1 (6.10)
.0-1 2.
has eigenvalues /, =2-^/2, }. 2 = 2, ;. 3 = 2+^/2,
with corresponding eigenvectors, normalised to unit
length,
x, =
' i 1
. x 2 =
"W2~
. X 3 m
i 1
W*
-W2
L * .
L-W2j
L i .
// Is readily verified that x\Xj = 0, (i # j).
§ 6.2. An iterative method for finding the largest eigen-
value. In the last section we obtained the eigenvalues for
a particular example by direct expansion of the determinant
(6.2). This will not be a very efficient method for deter-
minants of large order. In this section we discuss an
iterative method for finding the largest eigenvalue of a
general n x n matrix.
We assume that the matrix is symmetrical and real,
since this is the case covered by the theory in §6.1. The
eigenvalues arc then real, and we shall assume that there is
one eigenvalue of maximum modulus. An extension to the
case where there are two eigenvalues of the same modulus,
but opposite signs, is given in Ex. 6.7. The method given
130
NUMERICAL METHODS I
§6.2
ill this section can be extended to other cases (but these will
not be examined in detail):
(a) If the matrix is real but unsymmetrical, it is necessary
to introduce biorthogonal functions as in Ex. 6.9. The
method given in this section applies if there is one eigen-
value of largest modulus. In this case the eigenvalue of
largest modulus will automatically be real, since, if the
matrix is real, complex eigenvalues always occur in con-
jugate pairs. A method for dealing with complex conjugate
dominant eigenvalues is given in Ex. 6.8.
(b) If the matrix has complex elements, it is relatively
easy to extend the theory in § 6.1 to the case of a Hermitian
matrix, where the elements are complex, but a }t = a,j.
The eigenvalues are real, and the method given in this
section applies if there is one eigenvalue of maximum
modulus.
(c) For the general complex matrix the method, given in
this section applies, provided there is one eigenvalue of
maximum modulus, though, as in (a), it is necessary to
introduce biorthogonal functions in the theory.
Suppose that « is an arbitrary ;jxl column matrix.
By successive premultiplication by A form the sequence
"i = Au > "2 = ^«i = ^ 2 «o.
u p = Au p . 1 = A"u . (6.11)
(The suffix p in u p is used in a different sense from the
suffix /' in x,. This distinction should be borne in mind
throughout this section.) On assuming a representation
of the form (6.9) for u and using (6.5) we see that
u, = Au Q = a l X i x l +a 2 X 2 x 2 + ... + a„X n x n ,
u p = A"u = a l X\x[+a 2 X p 2 x 2 + ... + a n X" n x n (6.12)
">*■+,£ «'(f'Y
(6.13)
§6.2
EIGENVALUES AND EIGENVECTORS
131
Suppose next that | ;., | > | X 2 | ^ | X 2 | ^...£ | X a |.
Then (Xi/X,)" tends to zero as p tends to infinity (/>1).
If u p , x, arc the sth elements of u p , x, for some specific s,
and we define
we find
', = *!
I P = uJUp-l,
*l
J = 2 \X,
1 = 2 \X {
(6.14)
»^i as p-*co, (6.15)
where B, = a,x,/a 1 x l . In deriving this result we have
made two important assumptions:
(i) There is only one eigenvalue with largest modulus.
The general case is outside the scope of this
book (cf. Exs. 6.7, 6.8).
(ii) The trial vector u is such that a, ^ 0. For rapid
convergence we should try to choose u so that
a, is as large as possible relative to the other a,.
The approximate form of « is often known on
physical grounds.
In numerical work, instead of working with the sequence
u p it is convenient to form the sequences z p , w„, starting from
an arbitrary vector z = / m> , by means of the equations
Aw„. t = z p = l p w p , ip = 1,2,3, ...), (6.16)
where l p is now defined as the largest element of z p , so that
the largest element of w p is unity. Suppose that the largest
element of x, is the 5th element and that x, is normalised
so that this element is unity. A minor difficulty arises in
the theoretical analysis because for small p the largest
element of z p (and hence the unit element of w p ) may not
be the sih element due to the preponderance of com-
ponents other than x, in the initial trial vector. But we
132
NUMERICAL METHODS I
§6.2
know, from assumption (ii) above, that eventually the
effect of Xy will predominate. Hence without loss of
generality we can assume that the slh element of z p is always
the largest element. We write
where we denote the 5th element of x,(i> 1) by x h and the
a, arc constants whose precise values are not important.
From (6. 1 6) we find
H>, = |
*.+ Z «.(£Y4/{l+ t +($f*A> M7)
= 2
A,
1 = 2
*1
As before l p tends to X, as /? tends to infinity. Also w p
tends to x { .
If convergence is slow it will be desirable to use extra-
polation. In order to simplify the analysis we make a
third assumption:
(iii) X 2 * -X 3 . (6.18)
(We have already assumed | X% \ e£ I *3 I- This new
assumption imposes an additional restriction.) On using
this assumption we have, for sufficiently large p,
l„
[i+ct 2 x 2 {). 2 ix i y 'J
where k = cc 2 x 2 (X 2 — ^i)l^2- An improved approximation
/., for /, can be obtained by setting
/, = Mi +*(;.,/;.,)"}. (6.19)
On replacing p by p— 1 and eliminating k we find
L l -l p = (X 2 /X l )(L l -/,-,). (6.20)
§6.2
EIGENVALUES AND EIGENVECTORS
133
Hence in the terminology of § 2.4 we are dealing with a
first-order iterative procedure. Convergence can be
accelerated by using Ailken's (5 2 -process (2.15). (Compare
(6.20) and (2.13).) An improved estimate of X t is therefore
given by
in - 'p- r 9 v l, "fi ■ (62,)
'p-^'p-l + 'p-2
In exactly the same way (6.17) gives
m> p «x x + k(X 2 IX l y, where k = a 2 {* 2 - WiM 2 )x 2 .v,}.
Hence convergence of the successive approximate eigen-
vectors can also be accelerated by using Aitken's <5 2 -process.
But it must be remembered that the above analysis is not
valid if X 2 = - ?. 3 , and the extrapolation will not be very
accurate if X 2 is approximately equal to — X 3 .
In applying Aitken's formula it is essential that no
numerical mistakes be made in computing the iterates
used. For the special case when A is symmetrical a
numerical check on successive iterates is given by the
following formula derived from (6.16):
p+i'
'p+j - /,»> p .
(6.22)
From (6.20) it is clear that the rate of convergence of the
original first-order iterative procedure depends on XJX 2 .
The larger this ratio the faster the convergence. From
(6.19) it is seen that
X,
K
/ p - 1 -/p- 2 _ f
1,-1,-1 •
say.
(6.23)
The value of l p can similarly be estimated from the com-
ponents of n>p_ 2 , w p . ,, w p . In terms of t p Aitken's formula
(6.21) is (cf. (2.18b):
134
NUMERICAL METHODS I
8 6.2
§6.3
EIGENVALUES AND EIGENVECTORS
13S
In hand computation it is often convenient to compute /,,
mentally after each iteration and extrapolate using Aitkcn's
formula as soon as successive iterates begin to vary regularly,
as indicated by agreement between successive values of t p .
If successive values of t p show no signs of behaving regularly
as p increases this indicates that some of the basic assump-
tions (i)-(iii) in the above analysis are wrong (cf. Exs. 6.7,
6.8).
Ex. 6.2. Obtain the largest eigenvalue and the corresponding
eigenvector of the following matrix, working to four
decimal places:
A =
0030
-0-242
-0-603
0178
0-242
0-860
-0-343
0-393
0-603
-0-343
1-350
0-251
0178
0-393
0-251
2-630
(6.24)
Iteration with h-' = [1,1, 1, 1] gives
z* = 2-7386
"00180
0-1808
01349
1-0000
, z 7 = 2-7381
00191
01818
01316
1-0000
,z 8 = 2-7379
[00198"
01825
01296
.1-0000.
•
The numerical check (6.22) gives / 8 h>> 8 = 2-8770 8 and
l 1 w' 1 w 1 m 2-8770 2 . The following values of f 3 .../ 8 are
found: -17, -2-2, 12, 11, 30, 2-5. The last two values
are not very accurate due to round-off error. However,
the values of t p indicate that successive iterates are beginning
to behave regularly when p = 8 and this can be confirmed
by estimating t p from the components of the eigenvectors.
Since the round-off error is starting to become appreciable
compared with the differences between successive iterates
z 6 , z 7 , z 8 it is advisable to apply Aitken's <5 2 -process at this
point. (If we were working to five or more decimal places
we might iterate again to confirm the regular behaviour.)
We obtain
x, =
0-0198 + 00012"
=
f"0 0210"|
0-1825 + 0-0016
01 841
0-1296-0-0031
01 265
10000
1-0000
A further two iterations give
A, = 2-7376, .v,
00207
01836
01 265
1-0000
Further iteration merely repeats these values which therefore
represent the final estimate of the solution.
§ 6.3. The determination of subdominant eigenvalues and
eigenvectors. When the largest or dominant eigenvalue
and the corresponding eigenvector of an nxn matrix has
been obtained various methods are available for replacing
the matrix by an («-l)x(«-l) matrix with eigenvalues
k 2 ---X n . The largest eigenvalue and corresponding eigen-
vector of this new matrix can then be determined. The
procedure can be repeated until all the eigenvalues and
eigenvectors of the original matrix have been determined.
We confine our attention to the case of a real symmetric
matrix with real, distinct, eigenvalues.
The method described below applies only to symmetric
matrices. It depends essentially on the orthogonality
property of eigenvectors given by equation (6.7). The
method is described sufficiently clearly by an example, and
we give no general discussion. Other methods for removing
known eigenvalues and eigenvectors from a matrix are
136
NUMERICAL METHODS I
§6.3
described in Modern Computing Methods, H.M. Stationery
Office, 2nd Edn. (1961), Chapter III.
Ex. 6.3. Suppose that the second largest eigenvector of the
matrix used in Ex. 6.2 is given by x ' 2 = [x„ x 2 , x 3 , x 4 ].
From the equation x 2 x, = using the value found in
Ex. 6.2 for x, we find
x 4 = -0-0207*, - 0- 1 836* 2 - 0- 1 265* 3 . (6.25)
If we write out in full the first linear equation of the matrix
system Ax = ;.x where A is the matrix of Ex. 6.2, we have
-0030.y 1 -0-242.v 2 -0-603x 3 +0-178.v 4 = Ax,.
The unknown x 4 can be eliminated from this equation by
means of (6.25). In the same way x 4 can be eliminated
from the left-hand sides of the other three equations in
Ax = Ax, and finally we obtain
- 0-0337*, - 0-2747* 2 - 0-6255* 3 = be, , (6.26a)
-0-2501*, +0-7878*2-0-3927* 3 - A* 2 , (6.26b)
-0-608ZV, -0-3891*2 + 1 3182* 3 = A* 3 , (6.26c)
01 236*, - 00899*2 - 008 1 7* 3 = Ax 4 . (6.26d)
The first three equations represent an eigenvalue problem
for the 3 x 3 matrix
B =
-0-0337 -0-2747 -0-62551
-0-2501 +0-7878 -0-3927
L-0-6082 -0-3891 +1-3182
(6.27)
Suppose that the largest eigenvalue and the corresponding
eigenvector of the matrix (6.27) has been found by the
method of § 6.2 or otherwise. From the values of*,, * 2 , * 3
the value of * 4 can be found from (6.25) and the final values
of the *, can be checked from (6.26d) or vice-versa. In this
§ 6.3 EIGENVALUES AND EIGENVECTORS
way we obtain
137
A 2 = 1 -6500, x 2 =
-0-3119
-0-3650
1-0000
-00530
Suppose next that \y u y 2 , .v 3 , v 4 ] denotes the transpose
of one of the two remaining eigenvectors x 3 or x 4 . This
vector is orthogonal to x, and x 2 which are now known, so
that
-0-3119j,-0-3650>'2-l-l-0000.v 3 -0-0530v 4 = 0,
0-0207j,+0-1836>'2 + 0-1265v 3 + l-0000.v 4 = 0. (6.28)
Elimination of j- 4 between these two equations gives
>- 3 = 0-3087j,+0-3529;' 2 . (6.29)
On recalling the method of derivation of (6.26) we see that
(6.26) still hold if the x, are replaced by ;>,. If in (6.26a-c)
we eliminate y 3 by means of (6.29) we obtain
-0-2268><,-0-4954>-2 = A;-„
-0-3713^, +0-6492J, = Av 2 ,
-0-2013j,+0-0761.V2 - ty* (6.30)
The first two equations represent an eigenvalue problem for
the 2x2 matrix
c = r-
[::
2268 -0-4954
3713 0-6492
1
The easiest method of procedure is now to solve the
eigenvalue problem det [C-A/] = by expansion of the
determinant and direct solution of the resulting quadratic.
Back-substitution in (6.29) and (6.28) give the remaining
components of the eigenvectors of the original matrix A
138
NUMERICAL METHODS
§6.4
and (6.30) is a check. This gives, on normalising so that
the largest elements are unity,
X 3 = 0-8241, x 3 =
-0-4714
10000
0-2074
-0-2001
: ;. 4 = -0-4017, x 4 =
10000
0-3533
0-4334
-01741
§6.4. The iterative solution of linear simultaneous
algebraic equations. It is convenient to consider certain
iterative methods for the solution of linear equations at this
point, since the convergence of these procedures can be
discussed by means of the theory of eigenvalues and eigen-
vectors. We start by stating two classical iterative pro-
cedures, considering for simplicity the case of three equations
in three unknowns. It is assumed that the elements in the
principal diagonal of the coefficient matrix are non-zero.
Without loss of generality these can be taken to be unity.
(a) The equations for the method of simultaneous
corrections are
xT»>« -»ia*P -a n x?+b„
-v ( / +,) = -a 2 ,x[ r) -a 23 x<? + b 2 , (6.31)
X 3 — — a 31 \ t — "32*2 +O3.
(b) The equations for the method of successive correc-
tions are
*r»- -a.2*';' -«.3*3 r) +v
X2' +,> = -fl21* ( i r+ " -«23*3 r) +*>2. (6-32)
x 3 '+»=- a3l xr»-a 32 x 2 '+» +b 3 .
The new estimates of the x t are used on the right
as soon as they are obtained.
§6.4
EIGENVALUES AND EIGENVECTORS
139
A general class of iterative procedures, which includes
these methods as special cases, can be discussed in the
following way. We solve the equations
Ax = b, (6.33)
where A is an nxn matrix, by splitting the matrix A in the
form A = E-F. Then
Ex = Fx+h, (6.34)
and an iterative procedure is derived by writing
Ex (r +" = Fx" + b, (6.35)
where x ( " is the rfh approximation to the exact value x.
The iteration is started by setting x {0> equal to an arbitrary
column matrix. If it is possible to obtain an approximation
to the exact solution x it may be convenient to take x (0)
equal to this estimate of a: but often it is sufficient to choose
x (0 ' = 0. The matrix E should be chosen so that it is easy
to solve the set of equations (6.35) for x (r+1 \ assuming that
x (r) is known. This implies that det E is non-zero, and E~ '
exists. We have seen in Chapter V that it is easy to solve
sets of equations if the matrix of coefficients is of lower
triangular form, and E is often chosen to be of this type.
An example is the method of successive corrections
quoted above.
Instead of working in terms of the estimated values of the
unknowns it is useful to introduce the corrections
c ('+l) _ x lr * l) -x (r> .
(6.36)
As in the discussion of other types of iterative procedure in
Chapter II, if x (,) tends to the required value x as ;• lends
to infinity we say that the procedure is convergent. If the
procedure is convergent
x = x (0) + £ c<".
r = 1
(6.37)
A necessary condition for convergence of the infinite series
140
NUMERICAL METHODS I
§6.4
in this equation is that c (r) should tend to zero as r tends
to infinity. On writing r— 1 in place of r in (6.35) and
subtracting the result from (6.35) we see that
c (r+1) = Kc ir \ where K = E~*F. (6.38)
Hence
c (r + » = /r H c <o> (639)
The convergence of the procedure therefore depends on the
behaviour of A' f as r tends to infinity. This can be expressed
in terms of the size of the eigenvalues of K by the following
argument which should be compared with the analysis at
the beginning of § 6.2. We consider only the case where
the matrix K possesses n linearly independent eigenfunctions
2, with corresponding eigenvalues ). t , so that
A'z, = ;.,:,., or (F-/.,E)z ( - 0. (6.40)
It is assumed that there is one eigenvalue of maximum
modulus, say ). u where ?., is real. The other eigenvalues
can be complex. From (6.9) the vector c (0) can be written
in the form
r<°>
fl,: l +fl 2 z 2 + ... + fl„:„.
(6.41)
From (6.39), (6.40), (6.41),
c ,r+l > = fll ;/ 1 +, 2l +fl 2 A 2 +| Z2 +...+ a „;: n +, v (6.42)
For large r, R, since )., is the eigenvalue of largest modulus,
cfr + » « «yg"«t, c (R+s) * X\c w . (6.43)
If we perform R iterations and then estimate c (R + ,) for
s>0 by means of this result, equation (6.37) gives
x = x (0) + £ c"+c*,
where
r "TF+ 1
c"> m c"°
S-l 1-A,
(6.44«)
(6.446)
§6.4
EIGENVALUES AND EIGENVECTORS
141
We have assumed that \?. t \ < I, and this is obviously a
necessary condition for convergence of the procedure.
This formula provides an estimate of the error at any
stage of the iterative procedure provided that there is one
real eigenvalue X, which is larger in modulus than the
others, and the approximate value of ;., is known. From
(6.43) the value of A, can be estimated from the ratio of
corresponding elements of successive c (0 . The consistency
of the values of /., obtained at various stages of the iteration
is an essential check on the validity of the basic assumption
that there is one eigenvalue which is larger in modulus
than the others.
We define the rate of convergence /> by the formula
P = - In I A, |,
(6.45)
where ?., is the eigenvalue of AT with the largest modulus, as
defined in the last paragraph. To illustrate the significance
of/) we introduce the error vector
c w m x (r)_ x
Subtraction or (6.33) from (6.35) gives
e (r+ " = Ke< r) = tf r+1 e (0) .
As in the analysis leading to (6.43) we can show that for
large r the error in any component of the rth iterate will be
nearly equal to /.', />, where p is some constant. The number
of iterations required to reduce the error in this component
to, say, e is given by
Taking logarithms
r In (| p l/e) m In (| p |/«)
-to fil p
Hence the number of iterations required is inversely
proportional to the rate of convergence. The smaller the
value of | Xi \ the greater the rate of convergence.
142
NUMERICAL METHODS
§6.4
We can compare the rates of convergence of different
iterative procedures for solving the same set of equations
if we can compare the magnitudes of the eigenvalues of
maximum modulus of the corresponding matrices. An
instructive elementary discussion of rates of convergence
can be given for the special case of a band or tridiagonal
matrix, the elements of which are zero except on the
principal and immediately adjacent diagonals. Without
loss of generality we can assume that the principal diagonal
elements are unity so that we can set
A = L + I+U
where / is the unit matrix and in the 3x3 case we can set
A m
1 t h
I, 1
.0 h
01
, L =
ro o o-
, u =
"2
/,
1 .
.0 i 2 o.
u,
u
The methods of simultaneous and successive corrections,
equations (6.31), (6.32), can be written, respectively,
x ( ' +,) = -(L+U)x< r) +b, (6.46)
(I+/)x <r+1) = -Ux iri + b. (6.47)
We examine a generalisation of (6.47) which, as explained
later, is known as the method of successive over-corrections:
(L+pl)x (,+ u = -{(l-p)I+U}x ir) + b, (6.48)
where p is a constant which will be determined so as to yield
the fastest rate of convergence. The method of successive
corrections is the special case/? =1. We shall compare the
rales of convergence of (6.46) and (6.48). From (6.34),
(6.40), (6.45) this means that we need to compare the
eigenvalues A, p defined by the equations
(6.46): [_(L+U)+Xr]u = 0, (6.49)
(6.48): [{(!-/>)/+ £7}+/i(I + pJ)> = 0. (6.50)
§6.4
EIGENVALUES AND EIGENVECTORS
143
By means of the following procedure we can show that
/. and p are directly related. Write (6.50) out in detail.
Multiply the /th equation by x$ = 1, 2, 3) and set
P = (l-p)+pp,
Vj = PjVj, U= 1,2,3),
where the V t are new unknowns. The a,-, /?, are constants
which are at our disposal. The equations (6.50) become
AfcAPk+VtatAT, =0,
Ph«2PiVi + Pa z p 2 V 2 + u 2 a 2 p 3 V 3 = 0, (6.51)
pl 2 cc 3 p 2 V 2 + Pa 3 p 3 V 3 = 0.
We next identify these equations with (6.49). To make
the terms off the principal diagonal agree, we require
«lft = «2/»3 = 1.
a 2 /?,/i = a 3 p 2 p = 1.
By means of these equations, four of the six a„ fij can be
expressed in terms of the remaining two. The way in which
this is done is immaterial. Suppose that we set
Then (6.51) become
fetftW*?!* "l^ =o,
/ 1 K 1 + (« 2 /«,)PK a + H 2 ' / 3 = 0,
hV 2 H*J*i)Pn- l v 3 = o.
These are identical with (6.49) if
«l/«2 = /'*. V=u
P = (l-p)+pp = f i*A. (6.52)
144
NUMERICAL METHODS I
§6.4
First of all we consider the method of successive correc-
tions, p — 1. Then (6.52) gives p = ?}. This result shows
that in the case of a band matrix the eigenvalues for the
method of successive corrections are the squares of those
for simultaneous corrections. Hence for a given problem
involving a band matrix the two methods both converge or
they both diverge. If they converge, the rate of convergence
is doubled if we use successive instead of simultaneous
corrections.
To determine the value of p which gives the fastest rate
of convergence for any given '/. we discuss the dependence
of the roots of the quadratic (6.52) for p i on p. When the
roots are real we need to consider the root of larger modulus,
namely
I /«* I = (2/>)-{| X \ + Jtt 2 -4p+4p 2 )}, p l <P<P 2 , (6.53)
where /7,,/j,are the roots of thequadratic4/> 2 -4/7 + ;. 2 = 0.
ForO<|A|<l we hwe0<p,<±<p 2 <]. When the roots
are complex
| /i | = (/>-'- 1), p t <p<p 2
(6.54)
By drawing graphs of | p | against p for various ;. it is easy
to see that for convergence we need consider only p>\,
and then | /< | decreases for \<p<p 2 , and increases for
p>p 2 - Also the largest | p | corresponds to the largest
| X |. Hence the rate of convergence is greatest when
P-P2-HW0-*?)}.
(6.55)
where ). x is the root of maximum modulus for the method of
simultaneous corrections. From (6.53), (6.54) the eigen-
value of maximum modulus corresponding to the optimum p
is given by
/ll =(2 i ,)- 2 ;. 2 = p-'-i
= {1-V(1-A,)}{1+V(1-A 2 )}-
(6.56)
§6.4
EIGENVALUES AND EIGENVECTORS
14S
The value of the method can be illustrated by supposing
that A, = 1-e where e is small so that the methods of
successive and simultaneous corrections converge slowly.
Then
lit x {1- V /(2£)}{1 + 7(2 K )}-' * 1-27(26).
If A, = 0-995, e = 0005, then /i,ss0-8 and a remarkable
increase in the rate of convergence is obtained.
We can rearrange (6.48) in the form
jCr-M) = X (0 +ft) {_ tx (r + l)_ (/+y)x (r)_ fc}) ( g 5?)
where <u = I /p. If /> = <y = 1 we have the method of
successive corrections and the bracketed term on the right
is the correction added to x (r) to give x (r+l) . From (6.55)
since U, | < | we have ±</><l, or to>], so that in the
general method the bracketed term is multiplied by a
factor greater than unity. This is the reason for naming
the procedure the method of successive over-corrections.
The following example illustrates numerically some
aspects of the iterative procedures described in this section.
An application of these methods is given in Ex. 11.9.
Ex. 6.4. Obtain the results given in Table 6. 1 . for iterative
solution of the following equations, working to five
decimal places.
20*,- 8x 2 =6,
- 4.v,+20.v 2 - 4x 3 =6, (6.58)
- 4a- 2 + 19jc 3 - 4.v 4 = 5,
- 8x3+20*4 = 1.
In all cases the iteration was started with the estimates
20x«°' = 6, 20x 2 °> = 6, I9x 3 ' = 5, 20x 4 °> = 1.
146 NUMERICAL METHODS §6.4
The eigenvalue of maximum modulus was estimated from
*■ "{.?.**"}/{,?. 4 r=3 ' 4 -
For the methods of simultaneous and successive corrections
the results in Table 6.1 give/, %0-4and A, a 0-2 respectively,
though the iteration in the latter case has not proceeded far
enough to give an accurate estimate. The c* in Table 6. 1
Table 6.1
Results for the iterative solution o/(6.58).
AH figures have been multiplied by 10 5 .
Estimated
Method
/"
c«>
c«>
c<»
«t
x i
Error
Simult.
correct. (6.31)
1
1550
727
239
160
491 81
-30
2
1818
597
299
200
480 51
+24
3
1436
769
247
165
408 89
-37
4
1835
574
308
206
213 97
+27
Success, correct. (6.32)
1
1257
400
74
18
49214
+3
2
999
185
31
8
480 28
+ 1
3
525
83
13
3
409 25
-1
4
210
33
5
1
213 69
-1
Success
over-correcl.
1
918
297
15
1
492 11
(6.57
w= 1044.
2
809
68
4
480 28
+ 1
3
197
17
1
409 26
4
47
5
213 70
are then obtained from (6.44b). For the method of succes-
sive over-corrections we use, from (6.55) with A, = 0-4,
co=\lp = 2{1 + 70-84}" 1 xl 044.
In practice, of course, when to is estimated, we should not
start from the beginning again with the original x\ 0) as we
have done in Table 6.1, but we should start from the
currently available estimates of the x,.
In computing by hand, since the result is obtained by
§6.4
EIGENVALUES AND EIGENVECTORS
147
I
adding corrections and the accuracy of the extrapolation
also depends on the accuracy of the corrections, it is
essential to apply checks. Consider the method of successive
corrections applied to a 3 x 3 set of equations in (6.32).
On adding the equations and subtracting from the result
the corresponding equations with r- 1 in place of r we have
(l+«2 I +fl3.)ci r+,) +(l+fl32)4 r+1, +<-r ,)
where c{ r) is the /th component of c (r) . A final check should
also be applied by carrying out one iteration with the final
estimates of the x,.
Examples VI
Ex. 6.5. Show that the sum of the eigenvalues of A equals
the sum of the diagonal elements of A and the product of the
eigenvalues is equal to det A. If the matrix A has distinct
non-zero eigenvalues A, show that A~\A 2 have eigenvalues
XT 1 , % respectively, and that A, A~\ A 2 have the same
eigenvectors.
Ex. 6.6. Prove the following statements. The matrices
A -pi and A have the same eigenvectors. Iff/ is an eigen-
value of A -pi then p + ij is an eigenvalue of A. To find
the largest eigenvalue of A by the first-order iterative
procedure of § 6.2 it is in general quicker to iterate with
A— pi rather than A, for some suitable choice of p. Suppose
that /., is positive and that the eigenvalues are arranged in
the order ^>Aa^-..^A, with | A, | > \X 2 | and
I a, | > \X n |. The fastest rate of convergence is obtained
when p is chosen equal to iU 2 + /„). For this choice of p
assumption (iii) of § 6.2 is no longer valid and if it is
desired to use Aitken's <5 2 -process it may be better to
choose p = iU 3 + A„). The difficulty in applying these
results is of course that / 2 , ;. 3 , /„ are usually not known.
148
NUMERICAL METHODS
§6.4
5 6.4
EIGENVALUES AND EIGENVECTORS
149
When using an automatic computer it may be worthwhile
to find a suitable value of p empirically. If it is known that
all the roots are positive then a simple choice for /* is
|Aj where A* is an approximate value for A t found by
iteration, using A. The above device can be used to find
the smallest eigenvalue X„ of A by choosing p = 4(A t + X n - 1)
or iWi+^n-2)- Tnis procedure may not be very satis-
factory if the lowest roots are close together, which is often
the case.
Illustrate these remarks by discussing the case of a fourth-
order matrix with eigenvalues 4, 3, 2, 1.
Ex. 6.7. Show that if ;., = -X 2 then, using the same
notation as in § 6.2,
«2«+i « A 2 '(a,x,-a 2 x 2 ).
Deduce that if u p is any element of u p then uJu p - 2 lends to
A? as p tends to infinity and show how x, and x 2 can be
found.
Show that if Ui | > U 2 1 and A 2 = -A 3 so that assump-
tion (iii) of § 6.2 is not true, Aitken's <5 2 -process can be
applied to even or odd order iterates, but not to any three
successive iterates.
Ex. 6.8. If the dominant eigenvalues of A are complex
conjugate, suppose that they are the roots of A 2 + otX +P = 0.
Show that for sufficiently large p
«„+««„_, +/?u p _ 2 = 0.
Deduce a method for finding a, ft, and hence the dominant
eigenvalues of A.
Ex. 6.9. In order to understand the emphasis on
symmetrical matrices in this chapter we indicate how the
theory of orthogonality of eigenvectors must be modified
when A is unsymmetrical. Proofs of the following state-
ments are left to the reader. A and its transpose A' have
the same eigenvalues A, and we assume that these are
distinct. Corresponding to each A, there are eigenvectors
X, of A and j>, of A', and in general these are different:
Ax i = AjX|, A y t = A|)>j.
In general the x, are not orthogonal to each other, nor are
the y h but
ylxj = 0, (i + j).
We say that the x, and y, are biorthogonal. An expansion of
the form (6.9) exists for an arbitrary vector. Starting from
arbitrary vectors u , v we can form two sequences
u P = Au p _ l , t„ =^'b p .,.
If u p , v p are any elements of u„, Vp then
lim — ^- = A., lim
P-co u„_
^ = A,.
p-i
p-*to !>p_,
Further discussion of the unsymmetrical case lies outside
the scope of this book.
Ex. 6.10. Show by doing the necessary numerical
work that when finding the dominant eigenvalue of the
matrix (6.10) by iterating with u' = [1, 1, I] the rate of
convergence is more rapid than would be expected from the
ratio XiUi. Show that if we use «(, =[1, 0, 0] the rate of
convergence of the eigenvectors is given by X 2 IX { but the
rate of convergence of the eigenvalue is again more rapid
titan might be expected. Explain the reasons for these
results.
150 NUMERICAL METHODS §6.4
Ex. 6.11. Show that the eigenvalues of the matrix
A =
1
1
.
1
-a.
-fl.-l
-a n -i ■■
• -a 2
-«i
are identical with the roots of the algebraic equation
;." + fl 1 A''- 1 + ...+a n = 0.
(Many of the remarks made in previous chapters in con-
nection with finding the roots of algebraic equations apply
to the determination of eigenvalues.)
Ex. 6.12. Show that if x is a first-order approximation
to the eigenvector x, of A, so that x = x, +er where e is
small, then
L =
x'Ax
XX
= X t — Ce 2 + higher-order terms,
so that L is a second-order approximation to Xf The
quantity L is called the Rayleigh quotient.
Ex. 6.13. Write computer programs for finding the
largest eigenvalue and the corresponding eigenvector of a
matrix A using the following methods:
(a) Iterate using (6.16) until successive estimates of A,
and the components of x, differ by less than a
quantity e specified in a " Read " statement. If the
procedure iterates more than 30 times without
achieving convergence, print out the last two sets
of results and stop the machine.
§6.4
EIGENVALUES AND EIGENVECTORS
1S1
(b) Instead of the criterion for convergence in (a) use
the criterion that the quantity t p in (6.23) is
positive until the " noise-level" of the calculation
is reached (cf. Ex. 3.3).
(c) Use the method in (a) with the following addition.
Iterate until successive estimates of X x differ by
less than 001 X* where X* is the current estimate
of Xi. Compute the quantity t p in (6.23) and
continue the iteration with (A-$X* I), (Ex. 6.6).
Check that the new value of t p is greater than
the old value. Otherwise continue iterating
with,*.
(d) Use the method in (a) with the following addition.
Compute the quantity t p at each stage (p^3). If
t p and t p _ j differ by less than 0- 1 t p apply Aitken's
<5 2 -process and start the iterative procedure again.
Discuss precautions that could be taken to avoid
harmful effects due to the noise-level of the calculation in
cases (a), (c), (d). (The fact that the noise-level can upset
an orthodox iteration is one of the reasons why we may
prefer to use the noise-level itself as the criterion for con-
vergence, as in (b).)
Ex. 6.14. Iterative methods for the solution of Ax = b
are particularly useful if A is diagonally dominant i.e. if
in any row of the matrix the diagonal element is greater
than the sum of the absolute values of the non-diagonal
elements. Consider the method of simultaneous correc-
tions. Without loss of generality we can take the diagonal
elements to be unity. If we define
°i *• E I "» l« a = max a "
j i
where the sum is taken over all non-diagonal elements only,
then for diagonal dominance a<\. Let e (r> (/) denote the
/th element of the error vector c (r) = x-x (r) . If we define
152 NUMERICAL METHODS I § 6.4
E, to be the maximum value of e (r, (/) for / = I, 2, ... n then
l« w coi»iz«u«^ i, co|gB,-i2;ifl U i^B r . 1 .
Hence
E r ^,oE r - x ^o'E .
Hence the method of simultaneous corrections converges
for diagonally dominant matrices and for such matrices
the modulus of the largest eigenvalue is less than a.
Ex. 6.15. If X is an approximate inverse of A, show
that if we define
X,+ , = X,(2I-AX r ), r = 0,1,2,...,
then as r tends to infinity X r tends to A " '. The procedure is
second-order. This is an analogue, for matrices, of an
iterative procedure for finding the inverse of a scalar,
x = \/a. (Set n = -I in the first formula in Ex. 2.10.)
A necessary condition for convergence of the scalar pro-
cedure is that | 1 -ax | < 1. The analogue for the matrix
procedure is that the largest eigenvalue of the matrix
I—AXq must be less than unity in modulus.
Ex. 6.16. Write a computer program for the iterative
solution of the equations Ax = b using the method of
successive corrections. Start with the approximation
x} 0) = bja u . At each stage of the iteration estimate the
eigenvalue of maximum modulus by means of the formula
it****-*! M r+,) i
u
-(')
(c\ r) is the rth component of c (r) defined in (6.36). This
formula can be justified by (6.43).) When successive
estimates X ( {\ A ( {* l > are such that
|4+»-Jl$»| <8\k\ r+i) \.
§6.4
EIGENVALUES AND EIGENVECTORS
153
where 5 is a given number, say 005, adopt k l {* l) as the
required estimate of the eigenvalue of maximum modulus
X\. Correct the estimate of x by means of (6.44). Iterate
four times, starting from the corrected estimate of x, and
again apply (6.44). (Do not re-estimate X t .) Repeat this
procedure until the maximum element of the correction
vector C defined in (6.44b) is less than a specified number c.
What provision would you make for non-convergence of
the procedure and for non-validity of the correction formula
(6.44)? What information would you print out if the
procedure does not converge, to indicate why the method
has failed ? (Ex. 1 1 .9 is concerned with a computer program
for the method of successive over-corrections.)
INDEX
155
INDEX
Acceleration of convergence, 34,
133
Accuracy, 1-18, 41, 95-6, 117 (see
also Errors)
Ailkcn's 82-proccss, 33-6, 133
Algebraic equations, 19-53, 78
Automatic computer (sec Com-
puter, automatic)
Auxiliary matrix, 109
Back-substitution, 85, 109, 114
Bairstow's method, vii, 46-9
Band matrix, 1 19, 142
Biorthogonality, 130, 149
Characteristic roots (sec Eigen-
values)
Chebyshcv series, 15
Checks, 16, 23, 82-5, 107, 133,
147
Column matrix (vector), 103
Compact elimination method,
105-112
Complex roots of polynomials,
46-9
eigenvalues, 148
Computer, automatic, v, 54-5
and hand computing, 75-8
programming, 54-79
programs (see Programs)
Condition, of algebraic equations,
41-6
of linear equations, 92-8, 117-18
Convergence of iteration, 21
acceleration of, 34, 133
criterion, 21, 65, 72
order of, 31
rate of, 26, 28, 141
Cramer's rule, 80, 124
Curve fitting, 120
82-procedure, 33-6, 133
Dependence of vectors, 128
Desk machine calculation, 11-15
and automatic computing, 75-8
square roots, 30
Determinants, 80, 88, 93, 94, 1 12,
120, 124, 128
Diagonal matrix, 104
Diagonally dominant matrix, 151
Digital computer (see Computer,
automatic)
Direct and iterative methods, 81
Eigenvalues and eigenvectors,
122-153
calculation of largest, 129-135
calculation of subdominant,
135-8
properties, 126-8, 147-150
Equations (see Algebraic or
Linear equations)
Error, viii, 1-18 (see also Round-
off, Ill-conditioning)
absolute and relative, 3
inherent, in linear equations,
92-8, 117-18
in iterative procedures, 31-7
in iterative solution of linear
equations, 140
in polynomial equations, 41-6
in sums and products, 3-5
Expansion in orthogonal vectors,
128
Extrapolation, 33-6, 133, 140
False position, 51
Fixed and floating point, 6, 18
Flow-diagram, 62, 64, 74, 100
Hand-computing (see also
Checks)
compared with automatic, 75-8
desk machine calculation, 11-15
llermitian matrix, 130
Ill-conditioning,
algebraic equations, 27, 42-6
linear equations, 92-8, 117-18
Inherent error, 92 (see also Ill-
conditioning)
Inner product, 127 (sec also
Scalar product)
Interpolation, linear, 12
Inverse matrix, 112-18
computation of, 113-15
Inverse matrix, iterative method
for computing, 152
left and right, 116
Iterative methods, 19-53
convergence and divergence, 21
for inverse matrix, 152
for largest eigenvalue, 129-135
for linear equations, 81, 138-
147
order of, 31
programs for, 65-75
Latent roots (see Eigenvalues)
Least squares, 120
Linear dependence and inde-
pendence, 128
Linear equations, 80-101, 105-
118, 121
compact method, 105-112
errors, 92-8, 117-18
iterative methods, 138-147
successive elimination, 82-7
Linear interpolation, 12
Matrix, 102-122
band or tridiagonal, 119
eigenvalues of, 125
inverse, 112-18
left and right inverse, 1 16
singular and nonsingular, 113
Memory, 54
Mistakes, 15-17, 23 (see also
Checks)
Newton-Raphson iterative pro-
cedure, 26-30, 70-5
154
'"
156
INDEX
Noisc-lcvel, x, 10,66,72, 151
Nonsingular matrix, 113
Normalisation of vector, 128
Operator methods, viii
Order, of iterative procedure, 31
Orthogonal vectors, 127-8
Pivots, 87, 115
Polynomial equations, 38-50
Polynomials, evaluation of, 13-15
Principal diagonal, 105
Programming, 54-79
Program, 55
for eigenvalues, 150
for iterative procedures, 65-75,
78-9
for iterative solution of linear
equations, 153
for method of successive
elimination, 98-100, 119
• for roots of polynomials, 72-3
Proper values (sec Eigenvalues)
Scalar product, 103 (see also
Inner product)
Scaling, 6, 82, 87, 116
Significant figures, 2
Simultaneous equations (see
linear equations)
Singular matrix, 113
Square root, 30, 66
Square root method for linear
equations, 119
Statement, 56
Statistical theory of roundoff
error, 9
Successive corrections, method
of, 138
Successive elimination, 82-7
Symmetrical matrix, 105
I, definition of, 36
Transcendental equation, 19-38,
50-1, 78
Transpose of matrix, 105
Triangular matrix, 105, 142
Tridiagonal matrix, 119, 142
Rate of convergence, 26, 28, 141
(see also Convergence)
Rayleigh quotient, 150
Residual, 42, 75, 94
Round-off, 2-10
noise (see Noisc-lcvel)
Row matrix (vector), 103
Unit matrix, 105
Vector, 103
expansion in terms of ortho-
gonal functions, 128
PRINTED IN GREAT RRITAIN BY OLIVER AND BOYD LTD.. EDINBURGH
UNIVERSITY MATHEMATICAL TEXTS
Numerical Methods: 1
Iteration
Programming and
Algebraic Equations
It is desirable that students of
applied mathematics, the physical sciences, and
engineering should be given an introduction to
numerical analysis. The aim of this new title in
the University Mathematical Texts series is to
provide an elementary exposition of certain
basic ideas. This has been done by
considering a strictly limited selection of
typical methods with particular emphasis on
fundamentals, including error analysis,
error estimation, pitfalls and the detection of
mistakes. An effort has been made to avoid the
common practice of regarding numerical
methods as a collection of algorithms or
recipes. Since the advent of automatic computeis
has produced a revolution in computing,
programming for computers has been made an
integral part of the exposition.
Volume 1 of this title deals with accuracy and
error, iterative methods, algebraic
equations, matrix methods and eigen values.
Volume 2 deals with numerical methods for
solving ordinary and partial differential equations.
Net price 10s6d
;